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INTRODUCTION 


PURPOSE. 

The  goal  of  Che  experimental  studies 
reported  here  was  to  assess  the 
state-of-the-art  in  recognition  of  the 
spoken  word  by  means  of  computer 
technology  in  order  to  evaluate  its 
potential  usefulness  in  operational 
air  traffic  control  (ATC). 

BACKGROUND. 

The  initial  operational  capability  of 
automation  in  air  traffic  management 
was  never  expected  to  provide  the 
absolute  ultimate  in  either  efficiency 
or  safety.  It  was,  rather,  intended 
to  set  in  place  the  foundations  on 
which  continuing  evolutionary  improve- 
ments in  both  areas  could  be  built 
while  at  least  keeping  pace  with 
predicted  demand.  Substantial  effort 
is  currently  being  expended  to  improve 
the  quality  and  completeness  of  the 
raw  data  base  which  is  essential  in 
order  to  reap  optimum  benefit  from 
automation,  as  witness  the  development 
programs  in  beaconry,  communications, 
and  navigation.  For  the  present  and 
foreseeable  future,  however,  one  of 
the  critical  sources  of  complete, 
accurate,  and  timely  data  regarding 
the  instantaneous  and  projected 
traffic  situation  is  the  large  number 
of  human  operators  (traffic  con- 
trollers and  their  assistants), 
several  thousand  of  whom  are  on  duty 
at  any  instant  in  time. 

At  present,  there  is  only  one  channel 
through  which  controllers  can  transmit 
essential  facts  to  the  automation 
system:  through  their  fingers.  Many 
of  these  critical  items  of  information 
are  available  from  the  controller 
alone,  being  based  on  (or  resulting 
from)  decisions  he  has  made  on  the  one 


hand  or  having  originally  been  trans- 
mitted to  him  (as,  for  example,  by 
pilots  under  his  control).  The 
keyboard  "language"  that  must 
presently  be  used  to  communicate  these 
data  to  the  computer  system  is  arti- 
ficial, encoded,  almost  absolutely 
inflexible,  difficult  to  learn  and 
remember,  subject  to  error,  and  a 
source  of  distraction  to  the  user. 

Automation  in  traffic  cortrol  systems, 
even  initially,  has  improved  the 
quantities  and  qualities  of  informa- 
tion available  to  controllers  while 
relieving  them  (to  a limited  degree) 
of  some  of  their  former  mental,  vocal, 
and  manual  activity  (references 
1 and  2).  It  is  no  longer  necessary 
to  remember  target  identities,  nor  is 
frequent  radio  communication  necessary 
for  acquiring  or  reacquiring  identity, 
altitude,  and  speed  information  from 
pilots.  These  advantages,  however, 
are  secured  at  least  in  part  through 
the  imposition  of  new  or  altered  tasks 
upon  the  controller.  He  must  now 
manipulate  switches  and  keyboards — to 
modify  the  content  of  his  display,  to 
execute  certain  control  actions,  and 
to  update  the  computer  store  which 
provides  him  with  the  improved  infor- 
mation in  the  first  place.  The  data 
entry  workload  has,  in  fact,  necessi- 
tated increased  sector  staffing  in  a 
number  of  enroute  control  facilities. 
Thus,  a new  language  has  been  intro- 
duced into  the  world  of  air  traffic 
management — the  language  of  data  entry 
messages  to  computers. 

The  fact  of  the  matter  is,  however, 
that  all  of  these  "messages"  are 
composed  i.i  natural  human  language, 
formulated  in  words,  phrases,  and 
sentences  and  many  (if  not  the 
majority)  of  them  must  be  communicated 
from  man  to  man  by  human  speech  as 
well  as  entered  into  the  digital 
computer  files  upon  which  the  system 
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is  based.  The  human  language  which 
is  used,  furthermore,  is  a much 
restricted  subset  of  the  total 
repertoire  of  human  speech. 

The  substantive  vocabulary  for  any 
specific  ATC  operator  position  is  of 
the  order  of  three  hundred  "words" 
or  less.  The  number  of  kinds  of 
"messages"  or  "sentences"  is,  of 
course,  substantially  less  than  the 
number  of  "words"  (reference  3).  The 
structure  of  ATC  verbal  messages  is 
also  rather  rigid.  All  of  these 
factors — the  small  vocabulary,  limited 
message  set,  and  strict  syntax — tend 
to  reduce  the  problem  of  speech 
recognition  to  one  of  more  manageable 
size.  Furthermore,  it  is  not  neces- 
sary to  be  able  to  interpret  the 
speech  of  any  speaker  whatever,  but 
only  of  a limited  number  of  known 
speakers;  to  wit,  the  specific  con- 
trollers at  specific  positions  at  a 
given  time.  Nor  is  recognition  of  the 
speaker  required,  for  essentially  the 
same  reasons. 

It  has  been  widely  observed  that  the 
technology  of  isolated  word  recogni- 
tion is  "here"  (reference  4,  5,  and 
6).  "Isolated"  in  this  context  means 
only  that  the  word  must  have  a 
definable,  detectable  beginning  time 
and  end  time.  A "word"  may  be  multi- 
syllabic or,  indeed,  a rather  long 
phrase  so  long  as  it  is  uttered 
continuously  without  detectable  stops. 
Current  techniques  in  this  class  are 
capable  of  "word"  recognition 
accuracies  of  well  over  90  percent 
with  known  speakers  (i.e.,  speakers 
who  have  "pretrained"  the  device  to 
their  own  vocal  idiosyncrasies  by 
speaking  each  "word"  in  the  vocabulary 
several  times)  and  achieving  moderate 
sized  vocabularies  (e.g.,  10  to  about 
50  words).  Accuracies  of  98  to  100 
percent  have  been  obtained  where  the 
vocabulary  consists  of  digits  alone. 


A substantial  part  of  ground-to-air 
voice  communications  (about  20 
percent)  consists  precisely  of 
numbers,  while  the  "vocabulary"  of 
keyboard  data  entry  in  the  model 
A-3d-2  (Model  A)  enroute  system 
consists  almost  entirely  of  numbers 
and  letters.  Thus,  the  application  of 
voice  recognition  in  air  traffic 
control  does  not  necessarily  require 
interpretation  of  discursive  conversa- 
tion or  much  (if  any)  "understanding" 
of  "continuous  speech"  in  an  unlimited 
(or  even  very  large)  language.  While 
many  opinions  have  been  advanced 
regarding  the  applicability  of  voice 
data  entry  in  ATC  systems,  the  fact 
of  the  matter  is  that  the  question 
has  not  yet  been  systematically, 
experimentally  tested. 

TEST  OBJECTIVES. 

The  basic  questions  asked  of  the 
studies  reported  here  were  two: 

1.  Given  the  vocabulary  of  an  opera- 
tional ATC  data  entry  function,  what 
is  the  highest  order  of  accuracy  (or 
"reliability")  of  word  recognition  ob- 
tainable with  current  technology,  and 

2.  How  does  voice  data  entry  com- 
pare with  existing  keyboard  entry 
with  regard  to  accuracy,  speed, 
learnab i 1 i ty  , and  acceptance  by 
operators? 

Two  experiments  were  performed.  The 
first  was  designed  to  determine 
(a)  the  inherent  word-recognition 
accuracy  of  the  best  obtainable 
technology  using  a number  of  the 
subvocabularies  of  a representative 
data  entry  language  from  the  National 
Airspace  System  (NAS)  and  (b)  methods 
of  improving  recognition  accuracy 
wherever  it  might  be  found  less  than 
perfect.  The  second  experiment  was 
designed  to  compare  the  performance  of 
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Che  word  recognition  method  of  data 
entry  to  the  existing  keyboard  method. 

DESCRIPTION  OF  EQUIPMENT. 

A schematic  representation  of  the 
equipment  used  may  be  found  in  figure 
1 and  a photograph  of  the  assembly  in 
figure  2.  The  basic  device  used 
was  a Voice  Input  Processor,  model 
VIP-100,  manufactured  by  Threshold 
Technology,  Inc.,  of  Delran,  New 
Jersey.  This  device  was  chosen  based 
on  a National  Aviation  Facilities 
Experimental  Center  (NAFEC)  survey  of 
available  systems  and  on  the  basis  of 
surveys  performed  for  the  Naval 
Training  Equipment  Center  (NTEC)  as 
well  as  the  experience  of  NTEC  with 
the  same  model  equipment  in  studies 
performed  at  NTEC  (reference  7).  This 
equipment  functions  generally  in  the 
following  manner: 

1.  A single,  univocal  utterance  is 
spoken  into  the  microphone. 

2.  The  waveform  audio  energy  of  the 
utterance  is  transmitted  to  the 
audio  digitizer.  The  digitizer 
incorporates  32  audio  filters  or 
"features,"  16  of  which  are  of  the 
frequency/amplitude  type  spanning  the 
frequency  range  from  approximately  250 
through  5250  hertz  (Hz).  The 
other  16  filters  are  specially 
designed  to  detect  the  presence  of 
composite  or  unique  sounds  which  are 
characteristic  of  human  speech. 

3.  Every  2 milliseconds  (approxi- 
mately) the  digitizer  delivers  a 
32-bit  (one  per  filter)  digital  image 
of  the  immediately  preceding  audio 
signal  to  the  minicomputer,  from  the 
onset  of  the  utterance  to  the 
cessation  of  the  utterance. 

4.  The  software  in  the  mini- 
computer stores  and  saves  all  of  these 


"samples"  from  the  digitizer  and 
maintains  a count  of  them. 

5.  When  the  end  of  an  utterance  is 
detected,  the  minicomputer  time- 
normalizes  the  digitized  utterance. 
The  total  number  of  samples  (N)  in  the 
utterance  is  divided  by  16.  Each  N/16 
(sixteenth)  of  the  samples  is  then 
inspected,  feature  by  feature  (i.e., 
bit  by  bit)  for  the  presence  or 
absence  of  the  feature.  If  a feature 
is  found  in  one  quarter  or  more  of  the 
samples  within  that  particular 
sixteenth  of  the  whole  set  of  samples, 
a bit  is  set  ir  another  (normalized) 
array.  The  result  is  a composite, 
time-normalized  array  of  512  bits 
representing  the  presence  or  absence 
of  each  of  the  32  features  in  each 
of  16  time  segments  of  the  utterance. 

6.  This  digital  image  is  then  used, 
under  software  control,  in  one  of 
two  ways : 

(a)  Initially,  it  is  used  to 
establish  a set  of  "word-prints"  for 
each  of  the  words  in  a particular 
vocabulary  for  a particular  individual 
speaker  (person). 

(b)  Subsequently,  once  a set  of 
"word-print s"  or  "recognizer  training" 
images  has  been  established,  the 
digital  image  is  compared  to  the 
preestablished  reference  "prints"  for 
purposes  of  detecting  the  best  match 
for  recognition.  The  "training 
images"  for  a particular  speaker  are 
usually  saved  on  a bulk  storage  medium 
such  as  cassette  tape  or  magnetic 
disk.  Thus,  they  need  not  be 
recreated  each  time  a particular 
speaker  operates  the  system. 

The  word  recognized  is  translated 
by  the  software  into  the  American 
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FIGURE  1.  VOICE  LABORATORY  SCHEMATIC  DIAGRAM 


— — r: — * 

Kftj — 3 — : « 

& 1 

• 

O | 

f i r 
i ; 

1 j 

°l 

I 1 

, . . 

1—1 

EE 

1 

■ 

l i 

El 

i 

jFjL.  V 

1! 

i 

— 1 

— * — - — — « 

Standard  Code  for  Information  Inter-  significant  complexity  of  message 
change  (ASCII)  characters  chosen  to  structure.  Furthermore,  the  personnel 
symbolize  it  and  displayed  to  the  employed  in  these  applications  have 
operator  on  the  Tektronix  Model  4012  all  been  engaged  in  rather  elementary 
computer  terminal  display  tube.  The  tasks — basically,  visual  reading  of 
spoken  word  "Amend"  for  example,  would  labels  or  instruments  or  visual 
be  digitized,  and  the  digital  image  observation  of  objects  and  conditions 
compared  to  a set  of  previously  and  verbal  utterance  of  these  obser- 
established  images  for  all  of  the  vat  ions,  word  by  word.  The  only  known 
spoken  words  designating  message  application  in  a task  anything  at 
types.  The  best  match  would  be  all  similar  to  ATC  has  been  that  of 
accepted  (well  over  96  percent  of  the  NTEC  where  a ground-controlled 
time  it  would  be  correctly  recognized  approach  trainer  has  been  under 
as  will  be  seen  below),  and  the  development  (reference  7). 
letters  "AM"  would  be  printed  on 

the  display.  Several  of  the  NAS  keyboard  data  entry 

"languages"  were  tabulated  and 
The  other  equipment  in  the  laboratory  analyzed.  There  are  two  such  lan- 
system  served  various  purposes,  guages  in  regular  and  extensive  use 
The  cassette  transports  and  the  in  the  semiautomated  enroute  traffic 
magnetic  disk  were  used  for  storage  control  centers  of  the  agency  which 
and  retrieval  of  programs,  experi-  produce  daily  hundreds  of  thousands 
mental  data,  and  "training"  data  for  of  messages  requiring  millions  of 
the  various  operators  who  served  as  keystrikes.  There  are  a number  of 
experimental  subjects.  The  audio  other  entry  languages  in  the  system 
verification  subsystem  was  designed  (e.g.,  control  tower  cab,  terminal 
and  constructed  at  NAFEC.  It  was  radar  control  facility,  flight  service 
briefly  evaluated  for  use  as  a station,  etc.)  which  are  either  not  as 
substitute  for  visual  feedback  to  the  burdensome  or  distracting,  or  not  as 
operator  of  the  word  recognized,  complex  and  voluminous  in  use,  or 
More  will  be  said  of  this  below  in  both,  but  which  are  also  likely 
connection  with  experiment  II.  The  candidates  for  application  of  word 
DECwriter  (Digital  Equipment  Corp.,  recognition  technology, 
model  LA-36)  and  the  Teletype  model 

ASR-33  were  used  for  system  control,  The  key  language  which  was  chosen  as 
programing,  and  data  printout.  the  test  vehicle  was  that  used  by  the 

nonradar  or  flight  data  controllers  in 
enroute  control  centers.  The  struc- 
EXPERIMENT  I ture  and  vocabulary  of  this  language 

may  be  found  in  appendix  A.  This 
particular  language  was  selected  for  a 
RATIONALE . number  of  reasons.  In  the  first 

place,  it  is  one  of  the  more  complex 
While  word  recognition  technology  has  languages  in  use.  The  total 
been  successfully  applied  in  a signif-  repertoire  of  possible  messages  is 
icant  number  of  commercial  operations  larger  than  that  of  any  of  the  other 
such  as  package  routing  and  manu-  key  languages  used  by  personnel 
facturing  quality  control  inspection,  engaged  in  the  active  control  of 
none  of  therfe  have  involved  languages  traffic.  Finally,  the  key-entry 
with  very  large  “ vocabularies  or  any  workload  at  this  operational  position 
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is  the  largest,  in  total  volume,  in  spoken  messsages  could  be  assembled 
the  system.  Thus,  a very  difficult  from  a word  list  consisting  only  of 
application  was  undertaken  for  digits  plus  two  or  three  control  words 
investigation  right  at  the  outset,  (such  as  ''erase"  for  restarting  the 
The  theory  behind  this  choice  was  that  whole  entry  and  "backspace"  for 
(a)  it  appeared  highly  likely,  given  changing  the  last  digit).  The  second 
the  state  of  the  word  recognition  art,  element  of  some  types  of  message 
that  this  application  would  be  prac-  (e.g.,  weather  information  retrieval) 
tical  and  cost/beneficial  and  that,  a and  the  third  or  fourth  element  of 
fortiori,  less  complex,  less  difficult  other  messages  (e.g.,  early  handoff  to 
t applications  would  yield  to  the  same  a terminal;  hold  message)  is  a loca- 

approach  with  zero  or  minimum  tion  identifier  or  geographic  "fix." 
additional  research  and  development  The  keyboard  codes  for  these  place 
effort  or  that  (b)  many  or  most  of  names  are  not  always  mnemonic  (e.g., 
the  relevant  questions  for  the  lesser  Benton  is  coded  7QB),  but  the  place 
applications  would  be  answered  in  the  names  themselves  are  easily  spoken, 
course  of  attacking  the  greater,  even  No  attempt  was  made  to  survey  all 
if  the  present  state  of  technology  possible  fix-names;  however,  those 
did  not  prove  practical  for  this  tested  here  included,  for  one  sector 
particular  application.  in  the  New  York  air  route  traffic 

control  center  (ARTCC),  all  VOR's,  all 
PROCEDURE . intersections,  and  all  terminals;  in 

short,  all  the  fixes  normally  required 
The  language  chosen  for  test  was  found  at  the  position  as  elements  of 
to  include  a total  of  24  basic  types  key-entry  messages, 
of  message.  (An  additional  seven 

types  of  message  covering  "conflict  Two  types  of  message  (flight  plan 
alert"  entries  have  since  been  amendment  and  correction)  require 
added.  This,  based  on  experience  to  identification  or  naming  of  a flight 
date,  should  not  cause  any  special  plan  data  field  (e.g.,  assigned 
difficulty.)  Of  these,  15  types  of  altitude,  speed).  Eight  of  these  data 
messages  encompass  96  percent  of  fields  account  for  the  vast  majority 
all  messages  actually  entered  in  of  modifications  entered,  and  the 
operation.  In  addition,  these  field  content  or  substantive  data  most 
15  message  types  (see  appendix  A)  commonly  consists  of  digits, 
include  all  of  those  occurring  with  a 

frequency  of  1 in  100  or  greater.  Certain  types  of  entries  or,  more 

precisely,  parts  of  messages  currently 
The  first  element  of  every  message  is  made  with  keyboards  basically  exist 
the  message  type.  It  was  also  found  only  in  coded,  nonverbal  or  partially 
that,  in  most  cases,  the  type  of  nonverbal  form.  Consider  the  aircraft 
» message  must  be  followed  by  the  identity  N1009Y  (tail  number).  The 

identity  of  the  flight  data  file  most  convenient  way  to  make  such  an 
(flight  plan)  to  which  the  entry  entry  might  still  be  via  keyboard, 
applies.  Furthermore,  of  the  four  However,  an  "all  purpose"  subvocab- 
means  of  identifying  a flight,  the  one  ulary  consisting  of  all  of  the  digits 
most  commonly  employed  was  the  three-  plus  the  phonetic  alphabet  (which  is 
decimal-digit  computer  identity  number  part  of  the  linguistic  stock-in-trade 
assigned  to  every  flight  (reference  of  the  air  traffic  controller)  was 
7).  Thus,  the  second  element  of  most  made  a part  of  the  total  vocabulary  of 
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the  voice  data  entry  language  for  the 
purpose  of  making  the  comparatively 
fewer  and  rarer  entries  not  already 
encompassed  by  the  word  lists 
described  above. 

These  subvocabularies,  plus  a short 
list  of  commercial  aircraft  types 
and  the  list  of  relevant  avionics 
equipments  (or  type  "Qualifiers"), 
make  up  the  whole  vocabulary  as 
currently  constituted.  The  vocabulary 
and  syntax  of  the  language,  as 
previously  noted,  are  included  here  in 
appendix  A. 

The  first  experiment  conducted  was 
intended  to  establish  the  basic 
recognition  performance  of  the  VIP-100 
word  recognition  package  with  three  of 
the  subvocabularies  discussed  above; 
namely,  the  15  message  types,  the  21 
fix  names,  and  the  10  digits  (plus 
"erase"  and  "backspace")  list.  Each 
of  the  lists,  separately,  was  expanded 
into  a pseudorandom  assembly  in  which 
each  member  of  the  list  appeared  10 
times.  The  list  used  for  the  digits 
subvocabulary  may  be  found  in  appendix 
B.  Thus,  the  "reading  list"  for 
message  types  was  150  "words"  long; 
for  digits,  120  words;  and  for  fixes, 
210  words. 

Each  speaker  "trained"  the  word 
recognizer  by  speaking  each  "word" 
(some,  as  may  be  seen  the  appendix  B, 
were  composites  or  phrases  spoken 
without  internal  pauses)  10  times. 
This  resulted  in  composite  digital 
images  of  the  way  the  speaker  speaks 
each  of  that  particular  list  of  words. 
These  reference  images  were  then 
written  on  cassette  tape  for  later 


It  should  be  pointed  out  here  that 
this  "training"  process  was  conducted 
by  the  same-word-repetition  procedure 


that  was  essentially  built  into 
the  system  program  as  delivered. 
Other  users  of,  and  experimenters 
with,  this  type  of  equipment  use  this 
same  procedure.  For  example,  for  the 
vocabulary  consisting  of  digits,  the 
new  operator  first  speaks  the  word 
"zero"  10  times  in  succession  with  a 
brief  pause  (approximately  one-tenth 
of  a second)  between  successive 
enunciations.  Then  the  word  "one’’ 
would  be  spoken  10  times  and  so  forth 
through  the  word  "nine."  This  is  an 
important  point  to  note  in  the  light 
of  discoveries  which  were  later  made 
during  attempts  to  improve  the 
accuracy  of  the  system. 

Following  the  initial  "training" 
session,  each  speaker  read  the  pseudo- 
random list  described  above  (now  for 
recognition)  in  10  separate  sessions, 
in  the  case  of  message  types  and 
fixes,  5 sessions  for  the  digits  list. 
Data  were  automatically  collected 
during  each  test  session  on  the  number 
of  times  each  word  was  correctly 
recognized  by  the  computer,  the  number 
of  times  incorrectly  recognized,  the 
average  closeness  of  match  between  the 
spoken  entry  and  the  best  and  second- 
best  choices  among  the  reference 
images  (i.e.,  the  training  images), 
and  the  duration  of  the  spoken 
expression.  Samples  of  raw  and 
processed  data  are  in  appendix  A. 

Each  subject,  over  a period  of  several 
days  to  several  weeks,  spoke  (for 
recognition  testing)  each  word  in  each 
of  the  subvocabularies  100  times 
for  the  types  and  fixes  and  50  times 
for  the  digits.  The  principal  purpose 
of  testing  digits  was  to  ascertain 
whether  the  sample  of  speakers  pro- 
duced the  order  of  recognition 
accuracy  for  digits  which  is  commonly 
found  using  this  word  recognition 
equipment . 
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A total  of  12  speakers  served  as  test 
subjects  for  experiment  I.  Nine 
were  male  journeymen  ATC  specialists 
with  extensive  experience  in  the  NAS 
Enroute  Test  Facility.  Three  were 
noncontrollers , two  female  and  one 
male.  One  group  of  11  of  these 
speakers  served  as  subjects  for  the 
message  types  (9  male,  2 female)  and 
another  group  of  11  (10  male,  1 

female)  from  the  same  pool  of  speakers 
served  for  the  fix  names  and  the 
digits  subvocabularies. 

In  the  matter  of  user  familiarization 
and  operator  training,  several  impor- 
tant observations  were  made.  During 
the  test  series  for  each  speaker  with 
each  word  list,  recognition  accuracy 
and  "rejection"  data  were  processed  at 
least  after  every  second  session.  As 
a rule,  in  the  event  that  any  indi- 
vidual word  was  either  erroneously 
recognized  two  or  more  times  or 
rejected  as  unrecognizable  two  or  more 
times,  a new  set  of  "training"  data 
was  made  for  that  word  (and  for  the 
word  with  which  it  was  confused  if  the 
confusion  was  consistently  between  the 
same  two  words).  Thus,  as  recognition 
testing  proceeded,  the  quality  of  the 
reference  images  or  "training  data" 
for  some  of  the  words  in  each  list  for 
some  of  the  speakers  was  progressively 
refined.  This  does  not  mean  that  a 
great  deal  of  retraining  was  done.  A 
number  of  the  speakers  never  needed  to 
"retrain"  any  of  the  words  in  any  of 
the  lists  at  all.  For  example,  on  the 
average , each  speaker  needed  to 
retrain  one  word  one  time  for  the  list 
of  fixes.  Some  speakers  needed  to 
retrain  more  words  than  others, 
and  some  of  the  words  and  word  pairs 
were  more  troublesome  than  others; 
for  example,  the  fixes  Milton  and 
Benton  in  the  list  of  fix-names. 
Attempts  by  some  speakers  to  adopt  an 
extraordinary  (for  them)  pronunciation 
or  emphasis  in  an  attempt  to  improve 


recognition  of  a word  had  the  reverse 
effect.  Habitual  or  "natural"  expres- 
sion of  the  utterances  is  vital 
to  accuracy  of  recognition. 

It  should  be  pointed  out  that  the 
operators  did  not  receive  feedback 
of  results  during  testing.  The 
experimenter  could  see  the  feedback 
display  but  the  operator  could  not 
The  only  indication  operators  received 
about  results  came  to  them  very 
indirectly  when  they  were  asked  to 
retrain  a word  or  words  as  noted 
above . 

RESULTS. 

Tables  1,  2,  and  3 contain  the  results 
for  word  recognition  accuracy  of  the 
basic  Threshold  Technology  system  for 
the  three  subvocabularies  tested. 
Each  entry  in  tables  1 and  2 is  based 
on  1,100  voice  entries.  Each  of 
the  11  speakers  spoke  each  word 
for  recognition  by  the  system  100 
times.  Each  entry  in  table  3 is  based 
on  550  repetitions  of  each  word — each 
word  spoken  50  times  by  each  of  11 
speakers . 

The  basic  data  represented  in  tables 
1,  2,  and  3 are  the  numbers  of  times 
each  word  was  misrecognized  as  some 
other  word  in  the  same  subvocabulary 
(i.e.,  errors)  and  the  number  of  times 
the  word  was  rejected  (i.e.,  not 
accepted  as  any  of  the  words  in  the 
subvocabulary).  There  are,  obviously, 
several  ways  that  "accuracy"  could  be 
defined  in  this  situation.  An  error 
(misrecognition)  by  a voice  entry 
system  is  certainly  undesirable, 
indeed  totally  undesirable.  Rejects, 
or  "refusals"  to  recognize  the 
utterance  at  all  cannot  cause  direct 
harm.  If,  however,  the  rejection  rate 
is  very  high  (for  example,  one  out  of 
two  utterances  rejected)  even  if  there 
are  no  errors  at  all  it  would  require 
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TABLE  1.  WORD  RECOGNITION  ACCURACY:  MESSAGE-TYPE  SUBVOCABULARY 


WORD 

NUMBER 

ERRORS 

NUMBER 

REJECTS 

PERCENT 

ERRORS 

PERCENT 
ERRORS  PLUS 
REJECTS 

Amend 

25 

27 

2.3 

4.7 

Cancel 

62 

21 

5.6 

7.5 

Correction 

3 

2 

0.3 

0.4 

Departure 

20 

75 

1.8 

8.6 

Discrete  Code 

1 

7 

0.1 

0.7 

Readout 

1 

8 

0.1 

0.8 

Accept  Handoff 

27 

48 

2.5 

6.8 

Handof  £ 

9 

3 

0.8 

1.1 

Drop  Track 

4 

20 

0.4 

2.2 

Print  Strip 

13 

8 

1.2 

1.9 

Hold 

2 

7 

0.2 

0.8 

Release 

0 

0 

0.0 

0.0 

Report  Altitude 

21 

45 

1.9 

6.0 

Weather 

7 

11 

0.6 

1.0 

Transmit 

32 

10 

2.9 

3.8 

Overall: 

227 

295 

1.4 

3.2 

j 
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TABLE  2.  WORD  RECOGNITION  ACCURACY:  FIXES  SUBVOCABULARY 


WORD 

NUMBER 

ERRORS 

NUMBER 

REJECTS 

PERCENT 

ERRORS 

PERCENT 
ERRORS  PLUS 
REJECTS 

Williamsport 

35 

8 

3.2 

3.9 

Selingsgrove 

12 

9 

1.1 

1.9 

Milton 

39 

93 

3.5 

12.0 

Haze It on 

28 

27 

2.5 

5.0 

Wilkes-Barre 

3 

6 

0.3 

0.8 

East  Texas 

1 

8 

0.1 

0.8 

Lake  Henry 

1 

6 

0.1 

0.6 

Tobyhanna 

6 

5 

0.5 

1.0 

Allentown 

2 

5 

0.2 

0.6 

Stillwater 

2 

3 

0.2 

0.5 

Benton 

15 

43 

1.4 

5,3 

Sweet  Valley 

1 

0 

0.1 

0.1 

Lopez 

2 

0 

0.2 

0.2 

Snyders 

3 

1 

0.3 

0.4 

Slatington 

7 

2 

0.6 

0.8 

White  Haven 

30 

4 

2.7 

3.1 

Resort 

8 

23 

0.7 

2.8 

Pennwell 

3 

13 

0.3 

1.5 

Huguenot 

17 

15 

1.5 

2.9 

Solberg 

10 

2 

0.9 

1.1 

Freeland 

9 

17 

0.8 

2.4 

Overall: 

224 

290 

1.0 

2.2 
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TABLE  3.  WORD  RECOGNITION  ACCURACY:  DIGITS  AND  CONTROL  WORDS  SUBVOCABULARY 


NUMBER 

ERRORS 


NUMBER 

REJECTS 


PERCENT 

ERRORS 


PERCENT 
ERRORS  PLUS 
REJECTS 


the  operator  to  spend  a great  deal  of  The  message-type  subvocabulary  (table 
time  repeating  words  in  order  to  i)  showed  the  highest  overall  error 
complete  an  entry.  rate,  as  well  as  the  largest  number  of 

"standout"  results  in  terms  of  words 
A variety  of  ways  of  calculating  with  unusually  high  error  and/or 
figures  of  merit  can  be  envisioned,  reject  rates.  It  is  significant  in 
most  with  a legitimate  rationale.  The  this  connection  that  the  message-types 
two  methods  which  have  been  chosen  subvocabulary  was  the  first  contact; 
here  are  the  following:  that  any  of  the  operators  ever  had 

with  a word-recogniton  system.  It  has 
* (a)  For  each  word,  the  total  been  widely  observed  (reference  5, 

number  of  errors  (misrecognitions)  for  page  227,  for  example)  that  operators 
all  operators  divided  by  the  total  need  to,  and  do,  develop  a knack  of 
. number  of  entries.  For  tables  1 and  "talking  to  the  box;"  that  is,  in 

2,  as  noted  before,  the  total  number  addition  to  the  more  general  familiar- 
(N)  is  1,100  for  each  word.  For  Nation  effects,  such  as  the  develop- 
table  3,  N is  550.  ment  of  the  habit  of  speaking  at  a 

rather  uniform  volume  level,  after 
(b)  For  each  word,  the  total  of  several  sessions  of  making  voice 
errors  an£  rejects  divided  by  N . entries  operators  tend  to  fall  into  a 

natural,  offhand  mode  of  pronunciation 
In  the  first  method,  "percentage  which  contributes  to  recognition 
error"  is  interpretabie  as  the  rate  of  accuracy, 
misrecognition,  since  the  position  is 

taken  that  rejected  entries  are  at  it  must  also  be  remembered  that  the 
least  not_  errors.  The  second  method  data  presented  in  table  1 represent 
may  be  interpreted  as  the  maxi-  all_  sessions  and  utterances,  "warts 
mum  of  unacceptable  responses  of  ail  *nd  all"— including  the  earliest 
kinds  made  by  the  word  recognition  sessions  where  there  was  no  retraining 
system.  The  reader,  of  course,  is  of  troublesome  words  as  well  as  the 
free  to  perform  whatever  calculations  later,  more  nearly  error-free  sessions 
may  be  desired— the  raw  numerical  data  done  after  individual  speaker/ 
do  not  change.  In  fact,  with  values  utterance  problems  had  been  detected 
of  N as  large  as  found  here  and  and  corrected, 
numbers  of  "errors"  as  small  as  found 

here,  the  differences  in  the  final  i„  the  15  word  message-types  word 
percentage  values  vary  at  most  by  u,t  (table  1)  there  were  7 "words" 
only  tenths  of  a percentage  point  which  produced  errors  in  excess  of  1 
regardless  of  the  formula  employed,  percent.  Two  of  these  were  composite 

"words,"  such  as  "accept  handoff"  and 
The  principal  features  of  note  in  "report  altitude."  Some  of  the  errors 
tables  1 through  3 are  the  very  small  and  a major  proportion  of  the  rejects 
overall  error  rates  for  all  three  produced  by  these  words  resulted  from 
subvocabularies  and  the  fact  that  the  difficulty  of  articulating  them 
individual  members  of  each  subvocab-  without  any  internal  pause.  This 
ulary  were  found  to  produce  much  problem,  however,  was  confined  to  the 
higher  than  average  rates  of  errors  or  two  speakers  in  one  case  and  three  in 
rejections  or  both.  the  other  who,  for  example,  produced 


two-thirds  of  the  rejects  found  for  under  1 percent,  confirmed  results 
these  expressions.  The  high  error  reported  by  the  developers  and  other 
rate  for  the  word  "cancel"  was  almost  experimenters  with  this  technology, 
entirely  due  to  three  of  the  eleven  Secondly,  it  was  encouraging,  since 
speakers.  This,  in  fact,  was  the  such  a large  part  of  message  content 
general  case:  where  high  error  rates  in  the  languages  of  interest  consists 
were  found  for  a "word,"  from  half  to  principally  of  numerical  data, 
two-thirds  of  all  the  errors  found 

occurred  in  the  dtfta  for  1,  2,  or  ACCURACY  IMPROVEMENT  STUDIES, 
sometimes  3 of  the  11  speakers.  Other 

speakers  had  no  special  difficulty  While  the  recognition  accuracy  data 
with  these  words.  for  the  subvocabularies  of  this 

language  were  impressive  overall,  two 
The  second  subvocabulary  tested  was  major  considerations  inspired  a search 
the  place-names  or  fixes  (table  2).  for  methods  of  improvement.  In  the 

Seven  of  these  words  also  produced  first  place,  it  must  be  remembered 
error  rates  of  over  1 percent,  and  that  the  "user"  here  is  the  air 

again  only  four  of  them  produced  traffic  controller,  and  the  principal 

errors  greater  than  2.5  percent.  As  aim  of  voice  data  entry  is  reduction 

with  the  message  types,  in  the  of  distraction  from  his  or  her  main 

extreme  cases  (for  example  "Milton,"  concern,  namely  continuous  obser- 
"Whitehaven, " and  "Benton")  half  or  vation  and  management  of  the  dynamic 
more  of  the  errors  and  rejects  were  four-dimensional  traffic  situation, 
found  in  the  data  for  only  one  or  two  it  is  thus  essential  that  detection 
of  the  speakers.  and  correction  of  data  entry  errors  be 

brought  to  some  irreducible  minimum. 
The  last  of  the  subvocabularies  to  be  The  second  problem  is  that  of  indi- 
tested  was  that  which  consisted  vidual  differences  in  voice  recogni- 
of  the  10  decimal  digits  plus  the  2 tion  accuracy  from  speaker  to  speaker, 
control  words  "erase"  and  "backspace,"  While  precision  and  clarity  of  speech 
(table  3).  The  control  words  were  are  of  the  essence  of  the  craft  of 
included  in  this  word  list  for  initial  ATC,  some  controllers  necessarily  will 
testing.  In  entry  of  whole  messages,  speak  with  greater  uniformity  than 
as  will  be  seen  in  experiment  II,  others.  Thus,  while  the  overall  voice 
these  control  words  must  be  made  a recognition  error  rate  for  the 
part  of  every  subvocabulary,  since  it  message-types  subvocabulary  was 
may  be  necessary  to  correct  an  error  le88  than  1.5  percent,  individual 
or  start  over  at  any  point  in  a speaker  error  rates  ranged  from  less 
message.  than  0.1  percent  to  nearly  7 percent. 

With  the  "digits,"  subvocabulary,  the 
Here,  there  were  three  words  showing  overall  average  error  was  less  than  1 
an  error  rate  greater  than  1 percent,  percent,  while  the  range  for  indi- 
but  only  one  of  these  was  over  2.5  viduals  was  from  zero  to  2.3  percent, 
percent.  The  two  worst  cases  ("one"  similar  results  were  obtained  for  the 
and  "eight")  were  again  due  primarily  subvocabulary  of  fix  names, 
to  the  data  from  only  1 of  the  11 

operators.  The  important  things  about  it  was  decided,  therefore,  to  investi- 
the  results  for  this  particular  gate  means  of  error  reduction  and/or 
word  list  were  two.  First,  the  error  correction  which  might  be 
overall  average  rate  of  errors,  just  applied  to  the  basic  VIP-100 
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recognition  algorithm.  The  Naval 
Training  Equipment  Center  was  con- 
sulted regarding  some  of  the  recogni- 
tion subroutines  that  had  been 
developed  there  for  increasing 
recognition  accuracy  in  their  applica- 
tion in  the  ground-controlled  approach 
trainer.  These  techniques  as  well 
as  a variation  of  the  same  general 
concept  which  was  developed  for  NAFEC 
by  Threshold  Technology  were  experi- 
mentally tried  with  the  nonradar 
controller  data  entry  language  being 
used  here.  The  net  result,  despite 
manipulation  of  the  parameters  of 
these  routines,  was  either  an  increase 
in  rejected  inputs  or  an  increase  in 
the  error  rate  or  both.  In  retrospect 
this  should  not  have  been  surprising, 
since  the  logic  of  these  techniques 
was  directed  principally  to  the 
solution  of  the  recognition  problem 
where  the  input  utterances  were  rela- 
tively long  and  largely  identical  with 
the  exception  of  a single  element. 

For  example,  the  expressions  "slightly 
(above/below)  glidepath"  can  be 
differentiated  with  greater  accuracy 
if  both  the  reference  and  the 
input  images  are  pared  down  to  only 
those  parts  which  are  nonidentical 
and  a "second  look"  taken  at  the 
correspondences.  This  precise  situa- 
tion did  not  obtain  in  the  word  lists 
used  here.  The  more  common  type  of 
problem  encountered  was  confusion  of 
some  of  the  pairs  of  words  within 
a subvocabulary.  The  words  "trans- 
mit" and  "print  strip"  in  the  message- 
type  list  and  the  words  "Williamsport" 
and  "Resort"  in  the  fix  names  list 
were  among  the  frequent  confusions. 
Oddly  enough,  even  though  the  word 
"nine"  (instead  of  "niner")  was  used 
in  the  digits  word  list,  and  nearly 
all  errors  involved  the  five/nine  and 
nine/five  confusions,  a very  high 
order  of  accuracy  was  obtained  for 
both  words. 


In  the  course  of  trying  out  various 
alternative  decision  subroutines 
for  error  reduction  and  in  reexamining 
our  original  detailed  data,  the 
experimenters  were  struck  by  some 
interesting  features  of  the  word 
durations.  For  every  utterance  in  the 
original  tests,  data  collection 
routines  had  recorded  the  word  numbers 
and  correlations  for  the  best  and 
second  best  matches  and  the  duration 
(i.e.,  number  of  audio  samples)  of  the 
input  utterance.  In  the  course  of 
time  normalization  of  utterances,  the 
standard  software  had  been  discarding 
this  information  after  use.  It  was  an 
interesting  curiosity  of  the  subvocab- 
ularies that  some  of  the  errors  that 
were  common  (such  as  Williamsport/ 
Resort  and  fix/backspace/erase)  were 
quite  reliably  distinguishable  on  the 
basis  of  utterance  duration. 

In  the  course  of  investigating  the 
utility  of  this  phenomenon  in  turn 
(the  experimenters  started  collecting 
utterance  duration  data  during 
the  "training"  or  reference  array 
construction  mode  of  operation),  it 
was  further  discovered  that  there  were 
systematic  differences  in  utterance 
duration  during  "training"  as  versus 
"recognition."  The  average  duration 
of  a word  spoken  repetitively  during 
training  frequently  differed  from  the 
average  duration  of  the  same  word 
spoken  in  a pseudorandom  sequence. 
Since  the  durations  differed  under 
the  two  conditions,  it  was  hypoth- 
esized that  the  correlations  obtained 
in  recognition  would  necessarily 
suffer. 

The  software  was  then  modified  in  two 
ways.  First,  training  was  changed 
so  that  the  speaker  was  presented  with 
a pseudorandom  prompting  list.  He  or 
she  did  not  simply  repeat  each  word  in 
the  list  10  times  in  succession,  but 
rather  10  times  within  the  same 


list — but  seldom  or  never  the  same  In  addition  to  these  changes  in 
word  twice  in  succession  and  in  an  the  training  and  recognition 
unpredictable  order.  At  the  same  algorithms,  a "tuneup"  mode  of  opera- 
time, the  average  duration  of  each  tion  was  added  to  the  basic  program, 
word  as  well  as  the  shortest  and  in  this  mode  of  operation,  the  speaker 
longest  obtained  during  training  puts  on  and  adjusts  the  headset, 
were  recorded  and  made  a part  of  the  adjusts  the  input  volume  setting,  and 

reference  information.  The  recogni-  then  starts  reading  the  words  in  the 

tion  decision  algorithm  was  then  particular  subvocabulary.  The  recog- 
changed  to  make  use  of  the  duration  nition  decision  word  is  displayed  on 
data.  The  basic  logic  is  as  follows:  the  Tektronix  terminal  cathode  ray 

tube  (CRT)  and  just  below  it,  the 

1.  The  input  word  is  digitized,  time  duration  in  samples  of  the  word  just 

normalized,  and  its  duration  is  said  and  the  average  duration  of  the 

noted.  first-choice  (or  recognition  decision) 

word.  If  the  two  durations  are  not 

2.  The  normalized  feature  array  is  reasonably  close  (i.e.,  differ  by 

compared  with  the  reference  arrays  more  than  10  or  15  samples)  for 
for  all  words  in  the  subvocabulary,  several  of  the  words,  even  when 
and  the  routine  returns  with  the  repeated  several  times,  then  the 
correlations  for  the  best  and  headset  placement  and  volume  setting 
second-best  matches.  are  rechecked.  This  "tuneup"  mode  is 

also  useful  for  checking  the  effects 

3.  If  the  correlations  differ  by  more  of  a cold  or  other  speech-altering 

than  40,  the  best  match  is  selected  event  and  the  need  for  "retraining" 
as  correct.  specific  words. 

4.  If  the  correlations  differ  by  40  Having  made  new  training  data  by  the 

or  less,  the  input  word  duration  is  pseudorandom  repetition  method,  two 
compared  to  the  average  (during  of  the  "better"  (i.e.,  higher  overall 
training)  duration  for  the  first  and  recognition  accuracy)  and  two  of  the 
second  choice  words  unless  the  latter  "poorer"  speakers  were  retested  on  the 
two  durations  themselves  differ  by  three  subvocabularies  previously  used, 
less  than  30  samples.  With  only  one  exception  (fix  names  for 

one  of  the  "better"  subjects)  the 

5.  If  the  duration  of  the  input  word  difference  between  the  average  dura- 

is  closer  to  the  reference  duration  tion  of  words  in  the  training  or 
of  the  first-choice  word,  it  is  reference  data  and  the  average 
accepted  as  correct.  duration  of  the  same  words  under 

recognition  conditions  decreased 

6.  If  the  duration  of  the  word  is  substantially.  Uith  another  similar 

closer  to  that  of  the  second-choice  exception,  the  average  correlations  of 
word,  the  input  is  rejected.  input  words  increased.  That  is  to 

say,  the  quality  of  the  matches 

7.  If  the  two  reference  durations  between  the  inputs  and  their  reference 
differ  by  30  samples  or  less,  the  test  images,  on  the  whole,  improved, 
is  not  made,  and  the  first  choice  word 

is  accepted  as  correct.  As  might  be  expected,  overall  errors 

of  recognition  were  reduced.  The 
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percentage  error  across  all  speakers 
and  all  three  word  lists  went  from  1.0 
down  to  0.35  percent.  The  percentage 
of  rejects,  somewhat  surprisingly, 
went  from  1.3  down  to  0.8  percent. 
This  last  is  surprising  because  it 
was  expected  that  the  use  of  dura- 
tion information  in  the  recognition 
decision  logic  would  tend  to  increase 
the  reject  rate  by  rejecting  some 
doubtful,  atypical  but  correctly 
recognized  (on  the  basis  of  correla- 
tion alone)  spoken  inputs.  This  was  a 
trade  we  were  willing  to  make,  namely, 
the  exchange  of  rejects  for  errors. 
The  "cure"  for  a rejected  entry  is 
simple:  Say  it  again.  The  cure  for  an 
error  is  another  story  entirely. 

Thus,  it  would  seem  that  the  modified 
training  routine  alone  solved  most 
of  the  problems  we  sought  to  solve. 
In  addition  to  this  effect,  the 
duration  test  in  the  decision  logic 
only  slightly  increased  the  reject 
rate  for  two  of  the  speakers  on  the 
list  of  fix  names,  while  the  error 
rate  for  both  was  reduced  to  zero. 
Indications  are,  overall,  that  use  of 
this  additional  information  will 
convert  a portion  of  the  potential 
errors  to  rejects  for  some  talkers. 

Recognition  reliability  or  error  rate 
improved  for  both  the  "poorer"  and 
the  "better"  talkers  on  all  three 
subvocat/ularies  with  only  two  excep- 
tions wherein  it  simply  remained  the 
same.  In  one  of  these  two  cases,  the 
error  rate  was  zero  under  the  original 
test  conditions  and,  obviously,  could 
not  have  been  improved  in  any  event . 
The  improvements  for  the  "poorer" 
talkers  were  not  uniformly  dramatic, 
but  they  were  very  impressive  in  most 
cases. 

It  must  be  admitted  that  in  the 
follow-on  studies  reported  here,  we 


were  proceeding  on  a "pilot-study"  or 
"cut-and-try"  basis  until  the  very 
end.  Thus,  the  final  results  noted 
just  above  are  accounted  for  by  a 
combination  of  variables.  The 
training  procedure  was  changed,  the 
"tune-up"  feature  was  added,  and  the 
decision  logic  was  modified.  In 
addition,  there  may  have  been  some 
unknown  quantity  of  "Hawthorne  Effect" 
upon  the  "poorer"  talkers  who 
worked  closely  with  the  experimenters 
through  the  cut-and-try  phase  of  the 
experimentation.  The  "acid  test"  of 
the  objective  changes  should  properly 
be  made  with  a new  sample  of  subjects. 
On  the  whole,  however,  we  feel  that  we 
substantially  realized  our  goal  which 
was  reduction  of  recognition  error  as 
close  to  the  vanishing  point  as 
possible  given  the  technology  at  hand. 

OTHER  FINDINGS. 

Colds  and  allergies  which  affect  the 
characteristics  of  speech  were  found 
to  deteriorate  recognition  quality. 
However,  for  two  of  three  speakers 
who  among  them  contracted  three  head 
colds  and  one  allergy  during  the 
test  series,  no  serious  problems  were 
encountered.  For  these  two  speakers, 
it  was  necessary  to  retrain  only  a 
few  of  the  words  in  the  list  to 
recover  the  near-perfect  recognition 
previously  found. 

One  speaker  contracted  a second  cold 
after  several  weeks.  It  was  only 
necessary  to  read  into  the  system  the 
training  data  modified  for  the  first 
cold  in  order  to  achieve  the  same 
recognition  quality  as  produced  by  the 
"normal  speech"  training  data. 
Another  speaker,  however,  despite 
major  efforts  at  retraining  specific 
words,  was  unable  to  regain  a high 
recognition  accuracy  while  a cold 
persisted . 


It  should  be  noted  that  the  overall 
data  for  recognition  of  message-type 
entries  which  have  already  been 
discussed  (table  1)  include  the 
error  data  from  this  speaker  which 
accounts  for  approximately  half  the 
total  errors  encountered  with  this 
particular  subvocabulary.  When  this 
speaker  was  not  suffering  from  a 
serious  cold,  his  results  were  quite 
comparable  to  those  of  other  speakers. 

Retests  were  also  run  with  most  of 
the  original  12  speakers  using  the 
last  (and  best)  set  of  training,  or 
reference,  data  recorded  during  the 
initial  reliability  testing  phase. 
Retests  were  made  after  approximately 
3 months  and  again  after  approximately 
6 months  following  the  last  of  the 
original  test  series.  Both  accuracy 
and  reject  results  were  almost  iden- 
tical to  those  found  in  the  initial 
test  series. 

Finally,  microphone  quality  and 
placement  were  found  to  be  factors 
of  influence.  While  fully  systematic 
testing  of  these  variables  was 
not  conducted,  three  different  (but 
all  "noise  canceling")  microphone 
types  with  different  mountings  (one 
hand-held,  two  headset  or  headband) 
were  employed  at  various  times.  The 
hand-held  microphone  was  used  by  three 
of  the  speakers  during  the  testing  of 
the  15-word  message-type  list  and 
accounts,  in  part,  for  the  slightly 
lower  overall  accuracy  rate  found  for 
that  l i 8 1 than  for  the  others. 
Careless,  inconsistent,  or  unusual 
placement  of  microphones  (e.g.,  at  or 
below  chin  height,  more  than  an  inch 
from  the  corner  of  the  mouth  in  the 
horizontal  plane)  immediately  elicits 
a high  reject  rate  because  of  loss  of 
signal  strength  and  can  quickly  be 
corrected  by  the  user.  The  microphone 
used  by  all  but  one  subject  for  the 
"digits"  subvocabulary  is  directly 


substitutable  in  existing  ATC  opera- 
tions for  the  carbon-type  microphones 
required  by  the  communications  systems 
employed  today.  This  microphone 
produced  excellent  results.  Micro- 
phone technology  has  also  improved 
since  these  experiments  began,  and 
some  experimenters  report  significant 
performance  improvements  due  to 
microphones  alone. 


EXPERIMENT  II 


RATIONALE . 

It  is  one  thing,  of  course,  to  secure 
a high  order  of  recognition  accuracy 
(greater  than  99  percent  for  even  the 
least  proficient  speakers  after  the 
incorporation  of  improvements  in  the 
training  procedure  and  the  recognition 
algorithm)  for  separate  parts  of  a 
total  language.  It  is  quite  another 
to  generalize  this  performance  to  data 
entry  in  total  real  jobs-of-work.  The 
operational  tasks  for  which  word- 
recognition  technology  was  being 
evaluated  involve  the  entry  of  whole 
messages,  not  just  single  words.  A 
typical  example  would  require  the 
entry  of  an  orderly  sequence  of 
utterances  which  convey  the  intention 
to  amend  a flight  data  store,  the 
identity  of  the  store  or  file,  and 
the  specific  modification  to  be  made. 
A number  of  examples  may  be  found  in 
appendices  A and  D.  The  purpose  of 
experiment  II,  therefore,  was  to 
make  basic  comparisons  between  the 
entry  of  whole  messages  by  voice  as 
versus  entry  of  the  same  messages  by 
the  keyboard  method  currently  in 
operational  use. 

PROCEDURE . 

The  language  chosen  for  test  purposes 
was  that  typical  of  the  nonradar 


18 


it 


control  position  in  the  ATC  center. 
Two  hundred  messages  were  constructed 
in  two  sets  of  one  hundred  each.  Each 
hundred  messages  consisted  of  exactly: 

— 20  flight  plan  amendments. 
(Ten  required  amendment  of  only  one 
data  field,  five  amended  two  fields, 
and  five  amended  three  fields) 

--16  departure  messages.  Eight 
of  these  included  the  optional 
altitude  entry,  eight  did  not 

— 14  flight  plan  readout  requests 

— 13  handoff  entries 

— 12  handoff  acceptance  entries 

--7  flight  strip  printout 
requests 

— 6 weather  information  requests 

— 5 drop  track  (and  flight  plan) 
entries 

— 2 flight  plan  cancellations 

— 1 each  (a)  early  transmission 
of  flight  plan  to  a terminal, 
(b)  entry  of  a reported  altitude 
from  a flight  without  an  altitude 
transponder,  (c)  track  holding 
message,  (d)  track  released  from 
holding  message,  and  (e)  discrete 
beacon  code  assignment  entry. 

The  format  of  the  messages  may  be 
found  in  appendix  A,  and  a sample  of 
25  of  the  messages  may  be  found  in 
appendix  D. 

Each  set  of  100  messages  was  written 
out  on  individual  cards  in  narrative, 
descriptive  form  as  a requirement  to 
make  an  entry  and  not  as  a sequence 
of  words  to  be  said.  The  100  cards  in 
each  instance  were  shuffled  into  more 
or  less  random  order.  The  random 


sequence  of  messages  was  then 
transcribed  onto  printed  sheets,  25 

messages  to  a sheet.  Appendix  D 
contains  one  of  the  total  of  eight 
message  sheets  at  were  produced. 
This  whole  process  resulted  in  a 
standardized  set  of  messages  which 
contained  nonradar  controller  entries 
with  frequencies  representing  those 
found  in  actual  control  sectors  in  the 
field.  The  large  number  of  messages 

prevented  the  operators  from  learning 
messages  or  the  sequences  of  opera- 
tions required  to  enter  them.  Thus, 
every  message,  one  by  one,  had  to  be 
"translated"  from  the  descriptive 
form  in  which  it  was  presented  into 
a sequence  of  spoken  words  (or 

keystrikes)  necessary  to  compose  tne 

message  in  machine  acceptable  form. 

Each  of  the  experimental  operators  was 
given  a copy  of  the  operators  manual 
(appendix  A)  several  weeks  in  advance 
of  any  data  collection.  A schedule 
of  test  sessions  was  arranged  indi- 
vidually with  each  operator.  Five 
operators  completed  the  whole  test 
series.  All  were  ATC  specialists, 
four  male  and  one  female.  Two  of  the 
men  had  had  extensive  keyboard  data 
entry  experience  in  the  NAFEC  Enroute 
System  Support  Facility  but  none  of 
this  within  the  previous  two  years. 
One  of  these  had  also  served  as  a test 
operator  in  experiment  I and  was  thus 
more  familiar  with  the  voice  entry 
system  than  the  other  four.  Three  of 
the  operators  started  with  and  com- 
pleted the  voice  entry  system  first 
followed  by  the  keyboard  system.  Two 
started  with  the  keyboard  entry  system 
and  finished  with  the  voice  entry 
system. 

Prior  to  data  collection  using  the 
voice  entry  method,  each  operator: 

1.  Was  given  a complete  demonstrat ion 
of  the  operation  of  the  system  by 
the  experimenter. 
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2.  Was  given  preliminary  training  and 
familiarization  with  voice  entry 
by  creating  sets  of  word-prints 
(machine  "training")  for  the  subvocab- 
ularies tested  in  experiment  I and 
generating  several  sets  of  recognition 
data  for  these  word  lists. 

3.  Created  an  initial  set  of  word- 
prints  for  the  entire  103-word 
vocabulary  of  the  voice  entry 
language  (see  appendix  A).  This 
was  accomplished  by  using  the 
pseudorandom  training  method  developed 
as  previously  described  and, 

4.  Entered  a set  of  practice  messages 
(see  appendix  A). 

Prior  to  data  collection  using  the 
keyboard  method,  the  vocabulary  and 
syntax  of  the  keyboard  language  was 
explained  to  each  operator,  and  a set 
of  practice  messages  was  entered.  At 
all  times  during  data  collection  under 
both  voice  and  keyboard  systems,  a 
chart  was  available  to  the  operator 
showing  the  vocabulary  and  message 
structure  of  the  language  in  use  at 
the  time.  In  addition,  if  the  oper- 
ator had  difficulty  formulating 
any  message,  he  or  she  was  per- 
mitted to  ask  the  experimenter  for 
instructions.  In  practice  in  the 
field,  controllers  are  issued  and/or 
have  available  for  reference  a pocket 
reminder  card  which  describes  the 
required  format  and  content  of  the 
various  messages  in  the  keyboard 
language.  It  was  found  during  the 
experimentation  that  there  was  a 
"learning"  function  for  both  languages 
which  continued  through  the  first 
two  or  three  hundred  messages  entered. 
In  fact,  for  the  one-in-a-hundred 
types  of  messages  (such  as  entry  of 
reported  altitude,  etc.,  see  message 
distribution  described  above)  this 
learning  function  continued  right 
through  the  end  of  the  experiment. 


This  too  reflects  operational 
experiences — the  vocabulary  and  format 
Of  the  rare  types  of  messages  are  not 
well  remembered,  thus,  the  need  for 
the  "reminder  card"  noted  above. 

Data  were  collected  on  the  entry  of 
100  messages  (4  sets  of  25)  in  a 
single  session.  The  operator  reported 
to  the  laboratory,  was  seated 
comfortably  facing  the  Tektronix 
terminal  display  and  donned  the 
microphone  headset,  positioning  the 
microphone  approximately  in  the 
recommended  position.  The  reference 
or  word-print  data  for  the  specific 
operator  were  read  into  program 

storage  and  the  program  started  in 
"tuneup"  mode.  The  operator  then 

spoke  a number  of  words,  usually 
digits  and  message-type  words,  while 
checking  microphone  placement  and 

input  volume  setting  to  achieve  an 
approximate  match  between  input 
word  duration  (as  displayed  on  the 
terminal)  and  reference  duration  (also 
displayed  for  each  word). 

At  this  time,  any  retraining  of 
vocabulary  words  necessary  was  accomp- 
lished. Any  words  which  produced 
recognition  errors  or  frequent  rejec- 
tions during  the  previous  session  were 
retrained  at  this  time,  and  the 
training  data  thus  modified  were 

stored  for  use  in  the  ensuing  data 
collection  session.  The  experi- 
menter then  entered  the  identity  of 
the  output  data  file,  handed  the 
operator  the  first  set  of  25  message 
descriptions,  and  entered  a "start” 
signal  at  the  computer  console.  The 
operator  then  proceeded  to  translate 
the  message  descriptions  into  the 
sequence  of  words  required  to  compose 
and  enter  the  messages  one  at  a time. 
When  the  operator  said  the  last  word 
in  the  last  message,  the  experimenter 
entered  a "stop"  signal  at  the  control 
console  and  closed  the  data  file.  The 
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operator  was  then  given  a short  rest 
after  which  the  process  was  repeated 
for  the  entry  of  the  second  set  of  25 
messages  and  similarly  with  the  third 
and  fourth.  In  this  manner,  a 
total  of  100  messages  was  entered  at 
one  sitting.  Generally,  not  more  than 
100  messages  were  entered  on  any  given 
day  by  any  one  operator.  In  any  given 
day,  several  operators  would  usually 
be  scheduled.  The  data  files  were 
usually  processed  the  same  day  as 
collected.  In  this  way  operator 
• omission  of  whole  messages,  processor 

failures,  and  similar  rare  events 
could  be  detected  and  corrected  at 
once. 

An  almost  identical  procedure  was 
followed  for  keyboard  data  entry 
sessions,  except  that  no  microphone  or 
control  setting  tuneup  was  required. 
The  keyboard  messages  were  entered 
through  the  standard  keyboard  which 
was  an  integral  part  of  the  Tektronix 
terminal  console  which  also  provided 
the  operator  display.  In  both  voice 
and  keyboard  entry  methods  each 
message  was  displayed  on  the  Tektronix 
display  word  by  word  or  key  by  key  as 
it  was  being  entered  by  the  operator. 
The  operator  could  thus  check  the 
composition  and  accuracy  of  the 
message  as  it  was  being  assembled  and 
could  make  corrections  or  clear  and 
reenter  the  message  at  any  time  prior 
to  the  Enter  signal.  The  Enter  signal 
was  the  word  "go"  in  the  voice  system 
and  the  carriage  return  key  in  the 
keyboard  system. 

DATA  COLLECTION  AND  PROCESSING. 

The  data  collection  subroutines  of  the 
computer  programs  developed  and 
used  here  for  both  voice  and  keyboard 
entries  performed  several  functions, 
naoiely: 


1.  The  "start"  signal  entered  by  the 
experimenter  caused  a real-time 
clock  to  be  read  and  recorded.  The 
"s top"  signal  caused  the  clock  to  be 
read  a second  time.  The  difference, 
in  seconds,  was  calculated  and  printed 
out  at  the  control  console  at  the  end 
of  a set  of  25  messages.  ' 

2.  The  entry  of  a message  type 
(either  the  first  word  in  a voice 
message  or  the  first  key  in  a keyboard 
message)  caused  a real-time  clock  to 
be  read.  The  entry  of  the  "go"  word 
or  the  carriage  return,  respectively, 
signaled  the  completion  of  the  actual 
message,  causing  the  clock  to  be 
read  again,  the  difference  to  be 
calculated. 

3.  The  data  collection  software 
maintained,  for  each  message,  a record 
of  every  word  recognized,  every  word 
rejected,  every  backspace,  and  every 
erasure  in  the  voice  system  and  every 
key  struck  in  the  keyboard  system. 
All  of  these  together  with  the  time 
elapsed  between  first  and  last  entry 
in  the  message  (item  2,  above) 
were  written  message  by  message 
sequentially  as  entered  into  a disk 
store  file  for  later  processing. 

Samples  of  complete  data  files, 
selected  to  illustrate  all  of  the 
possible  events  recorded  may  be  found 
in  appendix  E. 

The  data  processing  software  was 
designed  to  perform  an  exhaustive 
tabulation  of  the  following  quantities 
for  each  data  file  representing  the 
entry  of  25  messages: 

1.  The  total  number  of  characters 
in  completed  messages.  Messages 
with  language  or  format  errors  and 
messages  with  elements  omitted  or 
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added  by  the  operator  were  printed 
out,  together  with  the  "correct" 
message  (from  a master  list)  for 
visual  analysis. 

2.  The  total  numbers  of  backspace  and 
erase  entries  and,  for  the  voice 
system  only,  the  number  of  entries 
rejected  as  not  recognizable. 

3.  The  number  of  erroneous  characters 
in  completed  messages.  The  master 
or  correct  message  in  each  case  was 
compared,  character  by  character,  with 
the  message  entered  by  the  operator 
with  either  system.  Errors  in  mes- 
sages which  were  printed  out  because 
of  format  and  language  errors  were 
visually  counted  and  added  to  the 
totals  calculated  by  the  program. 
The  vast  majority  of  messages  was 
completely  processed  by  machine,  and 
very  little  visual/  manual  processing 
was  required.  Samples  of  processed 
data  files  may  be  found  in  appendix  F. 

Five  of  the  15  types  of  messages  which 
are  common  at  the  enroute  nonradar 
control  position  were  "shorthand"  or 
very  brief  types  of  entries  also 
quite  commonly  entered  by  the  radar 
controller.  Three  of  these  require, 
for  example,  entry  of  only  the  message 
type  (executed  by  a single  "quick 
action"  button  in  a separate  key  pack 
at  the  radar  position)  plus  the 
track  identity  (most  commonly  three 
digits  entered  through  the  numeric  key 
pack  or  the  alphanumeric  keyboard). 
These  are  the  "accept  handoff,"  "drop 
track,"  and  "flight  plan  readout" 
messages.  Two  others,  the  "handoff" 
and  "reported  altitude"  messages, 
require  a single-key  message-type 
entry,  identity,  and  either  two  digits 
(for  handoff,  to  identify  sector  if 
the  handoff  is  made  to  other  than  the 
expected  or  "normal"  sector)  or  three 
digits  (altitude  entry).  Handoff  and 
accept  handoff  messages,  in  fact, 


constitute  over  two-thirds  of  all 
messages  commonly  entered  by  the  radar 
controller  (reference  7).  Within 
the  total  of  3,000  messages  entered  by 
the  five  operators  using  each  of  the 
entry  systems,  there  were  1,350 
messages  (in  total)  of  these  five 
types.  Thus,  though  the  total  number 
of  keystrikes  required  to  enter 
each  of  these  messages  by  the  key- 
board method  in  this  experiment  is 
greater  by  exactly  one  at  the  nonradar 
position,  it  was  considered  that 
accuracy  and  other  measures  for  these 
five  types  of  message  considered 
separately  would  provide  some  indi- 
cation of  the  relative  merits  of 
voice  versus  key  entry  at  the  radar 
position.  Therefore,  every  set  of  raw 
data  was  processed  twice,  once  to 
summarize  performance  over  the  whole 
set  of  15  "high  frequency"  nonradar 
position  entries  and  again  to 
summarize  performance  for  the  subset 
of  radar  position  entries.  It  should 
he  remembered  in  considering  the 
detailed  results  in  the  tables 
below  (except  for  data  on  "translation 
times"  where  the  distinction  does  not 
appear)  that  the  results  identified  as 
"R  Position"  are  a subset  of  those 
labeled  "D  Position"  and  are,  in  fact, 
contained  within  the  overall  summary 
results  tabulated  as  "D  Position" 
findings . 

RESULTS. 

Table  4 presents  the  numbers  of  errors 
of  all  kinds  which  were  found  in 
the  messages  as  finally  completed  and 
entered;  that  is  to  say,  the  errors 
(either  operator  committed  or  word 
recognition  errors  or  both ) that 
remained  undetected  and  uncorrected  by 
the  operators.  There  are  three  kinds 
of  errors: 

1.  Language  errors  (both  voice  and 
keyboard ) made  by  the  operators. 
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TABLE  4.  MESSAGE  ENTRY  ERRORS 


Voice 


D Position 


R Position 


Operator 


Per  Hundred 
Messages 


Overall 


Keyboard 


D Position 


R Position 


Operator 


Char 


Form 


Per  Hundred  2.7 

Messages 


Overall 


! ' VI/ 


- 
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An  example  here  is  entry  of  the  word 
"drop  track,"  or  the  key  code  "RS," 
when  the  entry  should  have  been 
"cancel"  or  "CN,"  respectively,  for 
cancellation  of  a flight  plan. 

2.  Format  errors  made  by  the 
operator.  Examples  here  are  the 
omission  of  message  field  delimiters 
(or  "punctuation"  so  to  speak)  such  as 
spaces  between  fields  or  entry  of  a 
field  designation  where  the  format  did 
not  require  one. 

3.  Character  errors.  These  were 
single  letter  or  digit  errors  in 
the  message  as  entered  and  could  arise 
from  mi s r ec ogn i t i on  of  a spoken 
word,  speaking  the  wrong  word,  or 
striking  the  wrong  key. 

In  table  4 the  errors  are  tabulated 
for  all  messages  entered  by  each 
operator  separately.  Errors  are  also 
summarized  as  the  number  per  hundred 
messages,  where  the  total  numbers  of 
messages  entered  by  all  subjects  in  "D 
Position"  messages  was  3,000  and  for 
"R  Position"  messages,  1,350. 

An  important  feature  to  note  in  the 
voice  entry  results  is  the  total 
absence  of  format  errors.  This  is 
accounted  for  solely  by  the  fact  that 
the  voice  system,  of  its  nature, 
requires  a "computer  in  front  of  the 
computer."  That  is,  format  control  is 
an  essential  part  of  the  voice  system. 
For  example,  the  first  entry  in  a 
message  must  be  the  message  type.  The 
syntax  built  into  the  voice  entry 
program  makes  it  impossible  for 
the  first  entry  to  be  anything 
else — whether  the  word  said  o£  recog- 
nized is  correct  or  incorrect. 
Obviously,  if  the  same  control  were 
applied  to  key  entries  (which,  in  the 
field,  it  is  not — format  is  not 
inspected  until  a whole  message  is 


entered)  format  errors  would  be 
virtually  impossible  with  key  entries 
either. 

The  important  comparisons  to  be  made 
in  table  4,  considering  first  the 
nonradar  or  D Position  messages  taken 
as  a whole,  are  as  follows: 

1.  Language  errors  are  three  times  as 
frequent  with  the  keyboard  method 
as  with  voice. 

2.  Single  key  or  character  errors  are 
at  least  one-third  more  frequent 
with  key  entry  as  with  voice. 

3.  If  one  considers  the  matter  of 
errors  of  all  possible  kinds  and 
accepts  the  built-in  impossibility  of 
format  errors  in  the  voice  system  as  a 
real  benefit,  the  advantage  of  the 
voice  system  seems  quite  clear.  The 
key  system  produces  nearly  three  times 
as  many  entry  errors  overall. 

With  respect  to  language  errors,  the 
difference  can  be  accounted  for  mainly 
on  the  basis  of  the  differences  in 
the  mental  encoding  or  learning  or 
remembering  processes.  The  voice 
system  uses  a "natural"  word  code, 
while  the  key  system  uses  a 
(necessarily)  contrived  or  artificial 
code.  There  is  further  evidence  of 
this  effect  in  the  "translation  times" 
presented  and  described  below.  Thus, 
the  intuitive  hypothesis  of  an  advan- 
tage to  the  natural  language  method 
of  data  entry  appears  to  be  tenable. 
There  are  fairly  large  individual 
differences  from  operator  to  operator 
but  they  appear  to  be  much  greater  for 
the  keyboard  system.  Operator  number 
3 produced  nearly  a third  of  the 
language  errors  found  with  the  voice 
system,  while  operators  1 and  2 
produced  about  three-fourths  of 
the  language  errors  in  the  key 
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system.  Curiously  enough,  operator 
number  1 was  one  of  the  two  who  had 
substantial  prior  experience  in 
the  system  support  facility  for 
enroute  development  which  has  been 
previously  noted. 

As  regards  single-character  errors 
found  in  the  voice  entry  results, 
exactly  half  of  the  character  errors 
are  found  in  the  data  for  operator 
number  5.  This  operator  was  the  only 
one  who  insisted  on  using  a hand-held 
microphone  for  entry  of  the  first  500 
messages,  finally  becoming  so  annoyed 
at  the  results  he  was  getting  that  he 
switched  to  a head-mounted  microphone. 
Character  errors  for  this  operator 
thereafter  virtually  disappeared.  If 
the  data  for  this  one  operator  are 
omitted  from  both  voice  and  keyboard 
results,  the  character  error  rate  per 
hundred  messages  for  key  entry  is  more 
than  twice  as  high  as  for  voice  entry, 
4.5  as  versus  1.9  per  hundred,  respec- 
tively. This,  of  course,  is  a clear- 
cut  illustration  of  the  sensitivity 
of  word  recognition  to  microphone 
quality,  technique,  and  positioning 
of  which  more  will  be  said  below. 

It  is  true  that  not  all  errors  are 
equally  serious.  The  operational 
key  entry  system  will  detect  many 
format  and  language  errors  and  call 
them  to  the  attention  of  the  operator 
(by  rejecting  entry  of  the  whole 
message),  thus  preventing  their  entry 
into  the  system.  Single-character 
errors,  however,  are  much  more  likely 
to  escape  detection.  There  is  much  to 
be  said  for  a system  which  reduces  the 
likelihood  of  error  in  the  first 
place. 

If  attention  is  focused  on  the  error 
pattern  for  radar  position  types 
of  messages  in  table  4,  language 
errors  are  less  frequent  with  both 
entry  methods  (0.5  per  hundred  for 


voice  and  1.6  per  hundred  for 
keyboard).  If  the  one  operator 
(number  1)  who  produced  13  language 
errors  in  225  messages  using  the 
keyboard  is  overlooked  as  being 
anomalous,  even  this  difference 
disappears,  and  the  rate  for  key  entry 
shrinks  to  0.6  per  hundred.  Sim- 
ilarly with  single  key  or  character 
errors,  if  this  one  operator  is 
not  included,  the  character  error 
rates  become  nearly  identical  at 
1.2-1. 3 per  hundred  with  both  systems. 
Thus,  with  the  exception  of  format 
control,  the  voice  system  did  not 
appear  to  offer  any  advantage  with  the 
types  of  messages  commonly  entered  at 
the  radar  controller  position. 

It  should  also  be  pointed  out  here 
that  the  large  number  of  character 
errors  (47)  attributable  to  operator 
number  5 with  the  voice  system  is 
precisely  due  to  use  of  the  hand-held 
microphone  and  disappeared  when 
the  change  was  made  to  a headset 
microphone.  None  of  these  results  is 
surprising  if  it  is  remembered  that 
this  subset  of  five  message  types 
comprises  the  briefest  and  simplest 
messages  in  the  whole  possible 
repertoire.  On  the  other  hand, 
if  the  effects  of  individual  operator 
differences  are  allowed  to  stand, 
the  keyboard  system  can  be  seen  to 
have  produced  about  two  to  three  times 
the  number  of  errors  of  all  kinds  per 
hundred  messages,  even  with  the 
radar  position  subset. 

Table  5 presents  a summary  of  the  data 
entry  rate  measurements.  Here  too, 
the  results  are  tabulated  for  the 
whole  D Position  message  set  and 
separately  for  the  R Position  subset. 
The  individual  entries  were  calculated 
by  dividing  the  total  of  all  the  entry 
times  by  the  total  of  characters 
entered. 
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TABLE  5 


DATA  ENTRY  RATES,  CHARACTERS  PER  SECOND 


Voice 


• 

D Position 

R_ 

Position 

Operator 

Min. 

Max. 

Min. 

Max. 

1 

0.9 

1.6 

1.0 

1.3 

2 

1.5 

1.8 

1.7 

2.3 

3 

1.2 

1.6 

1.2 

2.0 

4 

1.1 

1.5 

1.4 

2.6 

5 

1.1 

1.9 

1.1 

2.1 

Average 

1.2 

1.7 

1.3 

2.1 

D Position 

R_ 

Position 

Operator 

Min. 

Max. 

Min. 

Max. 

1 

1.1 

1.6 

1.9 

3.3 

2 

1.6 

2.0 

2.6 

3.0 

3 

1.1 

1.6 

1.6 

2.3 

4 

1.1 

1.7 

1.7 

2.5 

5 

1.3 

1.9 

U5 

2.7 

Average 

1.2 

1.8 

1.9 

2.8 
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It  will  be  remembered  that  an  “entry 
time"  was  measured  for  each  message, 
which  was  the  time  between  the  first 
word  or  keystrike  in  the  message 
and  the  last  word  or  keystrike.  The 
total  numbers  of  characters  were  also 
counted,  and  the  ratio  between  the  two 
constitutes  a number  of  characters  per 
second. 

The  rates  set  forth  in  table  5 are  the 
minimum  and  maximum  rates  for  each 
operator  under  each  of  the  conditions; 
i.e.,  the  slowest  and  fastest  sets 
of  100  D Position  (or  subsets  of  45  R 
Position)  messages  out  of  the  six  sets 
entered  by  each  operator.  It  is 
easily  seen  in  table  5 that  for  the 
full  set  of  D Position  messages  there 
was  no  difference  of  practical  impor- 
tance in  the  data  rates  for  voice  as 
compared  to  keyboard  entry,  the 
characters/second  varying  between  1 
and  2 for  both. 

R Position  entries,  however,  show  a 
30-  to  50-percent  advantage  for  the 
keyboard  system.  The  difference  in 
this  case  can  be  ascribed  to  two 
factors.  The  R Position  messages,  as 
previously  noted,  were  “short  and 
sweet"  and  consisted  mostly  of  digits. 
They  are  simple  and  quick  to  learn 
and  easy  to  remember,  and  the  keys 
required  for  the  bulk  of  each  entry 
are  close  together.  While  this  same 
simplicity  applies  to  voice  entries  of 
these  messages,  the  necessity  of 
speaking  each  word  with  clear  separa- 
tion slows  the  process  down  and  gives 
the  key  system  a speed  advantage.  At 
the  sasie  time,  this  sort  of  key  entry 
is  not  solely  a touch-typing  activity 
and  necessarily  distracts  the  visual 
attention  of  the  operator.  He  or  she 
always  looks  at  the  keys  (and  less 
often  at  the  message  being  composed  on 
the  display)  while  entering  the 
message.  In  voice  operation  this  is 


not  true — there  is  not  nearly  the 
shifting  of  visual  attention  from 
message  list,  to  keyboard,  to  display. 

These  results  tend  to  confirm  the 
general  findings  of  other  investi- 
gators (references  8 and  9)  that  a 
limited  capability  for  processing 
continuous  (unhalting)  speech  would  be 
a very  important  requirement  in  the 
development  and  application  of  this 
technology.  A word  recognition  system 
which  would  process  continuously 
vocalized  two-,  three-,  or  four-digit 
numbers  would,  in  all  likelihood, 
eliminate  or  reverse  the  advantage  of 
the  keyboard  in  the  R Position  message 
subset  and  produce  a clear  speed 
advantage  to  voice  in  the  D Position 
messages. 

Table  6 presents  a summary  of  the 
“translation  tism"  data.  It  will  be 
remembered  that  two  sets  of  time  data 
were  collected;  namely,  the  total 
times  to  read,  translate,  and  enter 
messages  and  the  total  tisms  consumed 
in  actual  entry  alone — either  speaking 
or  keying  from  first  to  last  element 
of  the  messages.  Obviously,  some  of 
the  actual  “translation"  actually  took 
place  during  the  entry  process. 

The  differences,  in  seconds,  between 
the  total  times  for  each  100  messages 
for  each  operator  and  the  sum  of  the 
“entry"  times  are  shown  in  table  6. 
Again  there  were  one  or  two  out- 
standing individual  differences; 
for  example,  operator  number  1 in  the 
keyboard  entry  situation  and  operator 
number  2 in  the  voice  method.  The 
important  feature  of  this  set  of  data, 
however,  is  that  there  is  an  obvious, 
consistent  advantage  to  the  voice 
system.  Translation,  message  composi- 
tion, and  related  elements  of  the 
data  entry  process  were  markedly 
facilitated  by  the  voice  entry  system. 


TABLE  6.  MESSAGE  TRANSLATION  TIMES 
(TOTAL  SECONDS) 


Voice 

Session* 


Operator 

1 

2 

3 

4 

5 

6 

1 

509 

328 

230 

340 

362 

370 

2 

225 

290 

162 

169 

135 

205 

3 

388 

246 

238 

250 

247 

255 

4 

564 

—irk 

297 

312 

274 

252 

5 

493 

358 

385 

370 

349 

226 

Average 

436 

306 

262 

288 

273 

262 

**  Clock  failure,  data  lost. 


Keyboard 

Session* 


Operator 

1 

2 

3 

4 

5 

6 

1 

1,071 

939 

808 

794 

660 

644 

2 

637 

213 

319 

617 

598 

724 

3 

608 

585 

540 

569 

322 

625 

4 

534 

507 

548 

460 

411 

393 

5 

582 

487 

370 

425 

267 

393 

Average 

686 

546 

517 

573 

452 

556 

* A 

session  included 

100 

messages. 
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This  part  of  the  process  appeared 
very  early  in  the  experience  of  the 
operators,  after  only  one  or  two 
hundred  messages  had  been  entered,  and 
the  advantage  remained  substantial 
thereafter.  The  translation  process 
for  the  keyboard  system  took  50 
percent  longer  at  the  very  beginning 
of  the  test  series  and  quickly  "rose" 
to  the  vicinity  of  100  percent  longer 
as  the  advantage  of  the  voice  system 
increased.  The  advantage  remained 
even  after  500  messages  had  been 
entered.  The  difference,  at  the  end, 
was  that  between  an  average  of  about 

2.5  seconds  translation  time  per 
message  with  the  voice  system  versus 

5.5  seconds  per  message  using  the 
keyboard  language. 

i 

Tables  7 and  8 summarize  the  numbers 
of  "backspace"  and  "erase"  entries 
made  in  each  100  messages  by  each  of 
the  operators  using  the  two  entry 
systems.  These  reflect  errors  which 
were  detected  and  corrected  prior  to 
final  entry  of  a completed  message. 
The  detected  and  corrected  errors 
include  only  operator  errors  in  the 
keyboard  system  and  both  operator  and 
word-recognition  errors  in  the  voice 
system.  'Table  7 summarizes  the 
results  for  the  whole  D Position  set 
of  messages.  While  the  overall 
average  numbers  of  corrections 
for  the  two  methods  appear  to  show 
about  a 33-percent  advantage  to  the 
keyboard  system  (10.8  corrections  per 
hundred  keyboard  messages  versus 
15.7  per  hundred  for  voice)  it  can 
also  be  seen  that  this  difference  is 
almost  totally  accounted  for  by  the 
results  from  one  operator  (number  4). 
If  data  for  this  one  subject  are  not 
included  in  calculations,  the  advan- 
tage is  much  diminished.  The 
difference  in  numbers  of  corrections 
per  hundred  messages  in  favor  of  the 
keyboard  is  reduced  by  half. 
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A similar  situation  may  be  seen  in 
table  8.  If  the  results  for  all 
five  operators  are  included  in  an 
overall  rate  of  corrections  per 
hundred  messages,  the  rate  is  nearly 
twice  as  high  for  the  voice  system  as 
for  keyboard  in  the  subset  of  R 
Position  messages.  However,  operator 
4,  who  stood  out  from  all  others  in 
this  respect  under  the  voice  entry 
condition  for  the  whole  class  of  D 
Position  messages  again  accounts  for 
nearly  all  of  this  difference.  If 
only  results  for  the  other  four 
operators  are  considered,  the 
difference  effectively  disappears. 

This  one  operator  had  a great  deal  of 
difficulty  adapting  to  the  requirement 
of  the  voice  system  that  words  be 
enunciated  separately  and  only  began 
to  develop  this  skill  in  the  last  100 
messages  or  so.  Again,  much  of  the 
negative  effect  in  this  instance  might 
be  eliminated  if  a degree  of  limited 
continuous  speech  recognition  emerges 
in  this  area  of  technology. 

Finally,  considering  the  overall 
results  for  words  rejected  by  the 
voice  system,  there  was  a surprisingly 
large  reject  rate.  The  average  100 
messages,  as  entered  through  the  voice 
system,  required  756  spoken  words  or 
utterances.  (See  appendix  D for  the 
count  of  utterances  in  a sample  of 
25  messages.) 

The  3,000  messages  entered  by  all 
operators  taken  together  required 
22,680  utterances.  An  additional 
2,617  utterances  were  rejected  in 
total.  Thus  about  10  percent  of  all 
words  vocalized  were  rejected  as 
unrecognizable.  This  is  an  extremely 
high  rate  compared  to  that  of  1 per- 
cent or  less  found  in  experiment  I. 
The  causes  of  the  high  reject  rate, 
however,  are  reasonably  clear. 
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TABLE  7 


ERRORS  DETECTED  AND  CORRECTED  ("BACKSPACES"  PLUS  "ERASURES") 
FOR  D POSITION  MESSAGES 


Operator 

1 

2 

Voice 

Session* 

3 

4 

5 

6 

" 

1 

17 

15 

29 

13 

14 

11 

2 

4 

7 

4 

15 

13 

12 

3 

8 

7 

4 

5 

5 

11 

4 

32 

51 

31 

41 

40 

15 

5 

10 

18 

18 

12 

5 

4 

Per  Hundred 

14.2 

19.6 

17.2 

17.2 

15.4 

10.6 

Messages 

Overall 

Operator 

_1 

2 

15.7 

Keyboard 

Session* 

3 

4 

5 

6 

' 

1 

6 

7 

4 

8 

16 

21 

2 

3 

17 

13 

14 

6 

19 

3 

11 

10 

3 

15 

18 

20 

4 

16 

13 

9 

12 

20 

6 

5 

7 

11 

4 

5 

4 

5 

Per  Hundred 

8.6 

11.6 

6.6 

10.8 

12.8 

14.2 

Messages 

Overall 

10.8 

* A session  included  100  messages. 
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TABLE  8. 

ERRORS 

DETECTED  AND  CORRECTED  ("BACKSPACES" 
FOR  R POSITION  MESSAGES 

PLUS  "ERASURES") 

Voice 

Session* 

Operator 

1 

2 

3 

4 

5 _6 

1 

8 

5 

9 

1 

7 7 

2 

1 

0 

0 

2 

1 1 

3 

3 

3 

0 

2 

1 1 

4 

8 

16 

8 

13 

14  5 

5 

9 

3 

3 

2 

5 0 

Per  Hundred 
Messages 

12.8 

12.0 

8.9 

8.9 

10.7  6.2 

Overall 

9.9 

Keyboard 

Session* 

Operator 

1 

2 

3 

4 

5 _6 

1 

3 

0 

0 

3 

2 0 

2 

0 

1 

3 

0 

3 3 

3 

0 

1 

1 

5 

10  1 

4 

6 

3 

0 

7 

3 0 

5 

2 

1 

2 

1 

1 3 

Per  Hundred 
Messages 

4.8 

2.7 

2.7 

7.1 

8.4  3.1 

Overall 

4.8 

* A session  included  45  "R  Position"  messages 


Basically,  all  of  those  factors  which 
have  been  found  to  reduce  the  quality 
of  the  voice  input  and,  necessarily, 
the  quality  of  the  response  of  the 
word  recognition  system,  came  into 
play  and  were  aggravated  during 
the  entry  of  whole  messages  in 
this  experiment  as  distinguished 
from  the  entry  of  single  words  in 
subvocabular ies . Principal  among 
these  for  all  of  the  operators  was  the 
speaking  cadence  problem.  It  is 
absolutely  essential  to  provide 
a brief  (approximately  1/10  second) 
"silence"  between  successive  words. 
Two  of  the  operators  developed  this 
facility  very  early  in  the  testing, 
three  others  were  beginning  to  develop 
it  only  near  the  end  of  the  testing. 
A second  major  cause  of  this  result 
for  all  subjects  was  the  microphone 
quality  and  placement  problem. 
Although  some  pains  were  taken  to 
"tuneup"  and  check  microphone  place- 
ment and  volume  control  setting*  at 
the  beginning  of  each  voice  entry 
session,  the  head-mounted  microphones 
would  move  or  be  moved  by  operators 
and  voice  amplitude  would  change  over 
time.  Attempts  were  made  during 
testing  to  reduce  the  reject  problem 
by  "retraining"  or  producing  new 
word-prints  for  words  which  were 
giving  high  reject  rates,  but  by  and 
large  this  approach  was  nonproductive. 
The  one  subject  previously  noted  who 
used  a hand-held  microphone  for  five 
out  of  the  six  sessions  with  the  voice 
system  produced  a significant  fraction 
of  the  total  count  of  rejects.  When 
this  operator  changed  to  a headset 
microphone,  the  reject  rate  dropped  by 
a factor  of  five. 

Here  once  more,  development  of 
even  the  limited  continuous-speech 
capability  previously  mentioned  would 
reduce  the  reject  problem  markedly 
and  quite  probably  to  acceptable 
proportions.  The  operators  who 


frequently  forgot  to  space  out  the 
spoken  digits  separately  would  be 
rewarded,  most  the  time,  by  three  or 
four  correctly  recognized  digits 
rather  than  held  back  and  forced 
to  repeat  by  one  or  more  reject 
signals. 

OTHER  FINDINGS. 

During  the  experimentation  described 
above,  NAFEC  engineers  working  on 
the  flight  service  station  (FSS) 
improvement  program  developed  a 
practical  and  inexpensive  device  which 
provided  the  capability  of  encoding 
human  speech  in  digital  form,  storing 
the  digital  representation,  and 
reproducing  it  at  will.  A version  of 
this  device  was  constructed  and 
attached  to  the  voice  recognition 
system.  The  purpose  of  this  effort 
was  to  permit  trial  and,  if  practical, 
the  implementation  of  audio  verifica- 
tion of  messages  entered — either  by 
voice  or  by  keyboard.  Software  was 
also  developed  to  store,  filter, 
retrieve,  and  reproduce  the  digitized 
speech,  and,  a number  of  voice  files 
were  created  representing  the  data 
entry  language  of  the  nonradar 
controller.  In  operation,  this 
system  functioned  as  follows: 

1.  The  data  entry  operator  spoke  the 
sequence  of  words  necessary  to  compose 
a complete  message. 

2.  Immediately  after  the  operator 
spoke  the  word  "go"  which  signaled 
completion  of  the  message,  the 
sequence  of  words  which  the  system  had 
recognized  was  repeated  back  over  a 
loudspeaker  to  the  operator.  These 
were  not  the  operators  own  words  but 
rather  the  sequence  of  words  that  the 
recognition  device  "thought"  he  had 
said,  now  retrieved  from  a disk  store' 
made  from  the  speech  of  another 
person,  reconstituted  into  audio  form 
and  output  through  a speaker. 
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It  was  found  in  trials  by  a number  of 
operators  that,  given  the  data  entry 
language  being  used,  the  process  of 
audio  verification  was  slow  and  too 
transitory  to  be  conveniently  used. 
The  output  rate  is  controllable, 
but  it  was  found  that  an  output  rate 
faster  than  the  original  input  rate 
(successive  words  separated  by  less 
than  approximately  1/10  second  between 
words)  was  too  fast  for  complete 
comprehension  by  the  operator.  Even 
this  fairly  rapid  rate,  however,  made 
the  message  verification  process  much 
slower  than  visual  checking  of  the 
message  as  it  was  composed  on  the 
operator's  display.  In  addition,  it 
was  found  difficult  to  "visualize"  the 
sequential  position  of  an  error 
detected  in  the  once  spoken  and  now 
"gone"  audio  output.  Thus,  correction 
of  errors  detected  was  more  difficult 
than  it  was  in  the  case  of  visual 
verification.  As  a result,  further 
application  of  auditory  feedback 
was  suspended  prior  to  experiment  II. 
This  capability,  however,  is  expected 
to  have  utility  in  other  applications 
of  word  recognition  technology  such  as 
the  development  of  automated  pseudo- 
pilots for  ATC  simulations  and 
controller  training. 


ANALYSIS 


The  results  of  the  experiments 
reported  here  present  a mixed  picture 
regarding  the  potential  operational 
value  of  word  recognition  technology, 
at  its  present  state  of  development, 
as  a substitute  for  the  keyboards 
presently  used  as  a means  of  traffic 
control  data  entry. 

In  general,  voice  data  entry  is  at 
least  as  effective  overall  as  current 
keyboard  methods.  In  several  respects 
it  offers  demonstrable  advantages. 


Because  of  i 
guage , voice  entry  produced  fewer 
language  errors  by  a factor  of  three 
and  saved  5 minutes  of  translation 
time  per  100  messages  entered  as 
compared  to  keyboard  entry.  Because 
of  the  high  rate  of  word  recognition 
accuracy,  voice  entry  resulted  in  33 
to  SO  percent  fewer  single  character 
errors  (equivalent  to  mis-struck  keys) 
in  completed,  entered  messages 
than  keyboard  entry.  Because  of  the 
sheer  presence  of  the  word  recognition 
equipment  which  included  a mini- 
computer with  software  format  control, 
the  voice  system  absolutely  prevented 
errors  in  message  format  which  were 
rather  common  (up  to  4 per  hundred 
messages)  with  the  keyboard  system. 
The  substantial  saving  in  translation 
time  previously  noted  also  reflects  an 
important  reduction  in  distraction  of 
thought  and  attention  to  the  data 
entry  task.  While  the  gains  to  safety 
and  efficiency  in  the  traffic  control 
process  resulting  from  all  of  these 
advantages  would  be  very  difficult  to 
quantify,  they  are  clearly  nontrivial. 

On  the  other  hand,  the  keyboard  entry 
system  as  currently  employed  in 
the  operational  enroute  control  system 
showed  two  advantages  over  the  voice 
system.  In  the  subset  of  radar 
controller  messages  entered  by  both 
systems,  there  was  a 50  percent  higher 
data  entry  rate  with  the  keyboard  than 
the  voice  system.  While  this  advan- 
tage to  the  keyboard  did  not  appear  in 
the  nonradar  controller  messages  (and 
was,  in  fact,  neutralized  by  a faster 
entry  rate  for  the  additional  10  types 
of  messages  in  the  ’nonradar  set),  it 
is  an  advantage  of  some  import  to  the 
evaluations  reported  here.  The 
principal,  if  not  sole,  reason  for 
this  keyboard  advantage  is  the  aingle- 
word-at-a-t ime  limitation  of  the 
recognition  technology  employed  in 
these  experiments.  This  fact,  in 
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turn,  points  strongly  to  a need 
for  development  of  at  least  a 
limited  continuous-speech  recognition 
capabi  lity. 

The  other  advantage  of  the  key- 
board entry  method  employed  in  this 
study  was  that  less  effort  was  exerted 
by  the  operators  in  the  detection  and 
correction  of  errors  during  the 
message  composition  and  keying 
process.  A net  result,  of  course,  was 
the  persistence  of  more  undetected 
errors  in  the  completed  keyboard 
messages  which  has  already  been  cited 
as  an  advantage  of  the  voice  system. 
However,  the  fact  that  the  voice 
system  elicited  more  corrections 
(i.e.,  backspaces  and  erasures) 
bears  on  the  responsiveness  or  con- 
venience of  use  of  the  voice  system  as 
currently  configured.  Enunciating 
separate  words  close  together  in  time 
will  almost  always  result  in  rejection 
of  the  words  or  an  error  of  recogni- 
tion or  both.  The  rejection  of  an 
input  is  signaled  by  an  audio  tone  or 
"beep,"  calling  the  attention  of  the 
operator  to  the  display.  As  has 
been  seen  in  the  results  of  these 
experiments,  operators  will  most  often 
correct  errors  resulting  from  improper 
input  cadence.  Rejections  due  to  this 
same  cause  are  recognized  as  such  with 
the  result  that  the  operator  then  pays 
special  attention  to  adjusting  input 
cadence  to  overcome  the  problem. 
There  is  no  doubt,  however,  that  the 
necessity  to  adapt  to  this  limitation 
of  word  recognition  systems  is  a 
source  of  annoyance  to  operators,  even 
in  the  relatively  uncomplicated 
environment  of  the  laboratory.  As 
with  a number  of  other  aspects  of  the 
performance  of  the  operator/mach ine 
voice  entry  system  (such  as  error 
rates,  data  entry  rates,  message 
translation  times)  all  of  which  tend 
to  be  interrelated,  this  factor 


could  not  be  expected  to  improve  in 
the  operational  environment.  It  is 
much  more  likely  that  overall  perfor- 
mance of  both  the  voice  system  and  the 
keyboard  system  used  in  these  experi- 
ments found  in  the  laboratory  would 
deteriorate  in  the  operational  setting 
where  they  would  be  enmeshed  in,  and 
interact  with,  the  other  task  compo- 
nents of  the  total  air  traffic 
controller  job. 

Even  as  these  experiments  were  being 
conducted,  the  technology  of  voice 
input  to  computers  has  been  advancing. 
There  have  been  some  improvements  in 
microphone  design.  There  are  other 
developments  just  emerging  at  thi6 
writing  (reference  10)  which  appear  to 
bear  some  potential  for  improving  the 
operator  interface — perhaps  requiring 
less  adaptation  of  the  speech  habits 
of  the  speaker,  less  sensitivity  to 
microphone  positioning  and  technique, 
and  simplified  methods  of  "tuning" 
the  voice  system  to  the  individual 
operator.  If  some  of  these  improve- 
ments can  be  realized  and  verified, 
they  may  greatly  reduce  or  eliminate 
limitations  of  voice  entry  in  the 
applications  investigated  here. 

Meanwhile,  the  results  of  these 
experiments  indicate  that  word  recog- 
nition technology  should  be  given 
serious  consideration  as  an  alterna- 
tive to  keyboard  data  entry  in  ATC 
applications  of  the  reasonably  near 
future.  Specific  applications  which 
appear  capable  of  benefiting  from  the 
advantages  of  voice  entry  are  the 
radar  control  positions  in  enroute 
control  where  the  reduced  distraction 
of  attention  could  be  a valuable 
safety  feature  and  control  tower 
cab  operations  where  the  data  entry 
elements  of  the  jobs  are  less  onerous 
than  in  enroute  control  and  where 
keyboards  and  other  manual  controls 
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cannot  be  made  continuously  and 
conveniently  accessible  to  traffic 
controllers. 


CONCLUSIONS 


demonstrated  lack  of  advantage  in  data 
entry  rates  in  the  traffic  control 
applications  tested. 


Finally,  it  seems  evident  that  the 
aviation  operational  and  evaluation 
community  should  continue  to  maintain 
contact  with  and  contribute  to  this 
area  of  rapidly  developing  technology. 


RECOMMENDATIONS 


1.  Word  recognition  technology  should 
be  given  serious  consideration  as  an 
alternative  to  keyboard  data  entry  in: 


1.  Word  recognition  technology  at  its 
current  state  of  development  has 
demonstrable  advantages  in  accuracy, 
simplicity,  and  convenience  over 
existing  keyboard  methods  of  data 
entry  in  ATC  applications. 


a.  Applications  where  the  data 
entry  job  element  is  a source  of  dis- 
traction of  visual  and  mental  atten- 
tion when  performed  with  keyboards. 


b.  Applications  where  operator 
access  to  keyboards  is  limited. 


2.  Present-day  word  recognition 
has  some  definite  limitations, 
such  as  sensitivity  to  operator 


2.  The  development  of  improvements  in 
speech  processing  technology  should 
be  awaited  and  evaluated  before 
adoption  in  any  major  upgrading  of 
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The  voice  entry  language  for  the  NAS  reveal  occasional  confusions  (a  common 
enroute  flight  data  control  position  one  is  8 for  3)  which  can  be  reduced 
closely  parallels  that  presently  or  eliminated  by  retraining;  i.e., 
employed  for  keyboard  entry  at  the  creating  new  reference  images,  just 
same  position.  Thus,  every  message  for  those  words, 
composed  has  a rigid  format.  The 

specific  words  chosen  by  the  user  to  It  is  generally  most  undesirable  for 
represent  keystrikes  or  combinations  the  speaker  to  adopt  an  unnatural 
of  keystrikes  are  not  sacred;  however,  pronunciation  when  training  or 
as  Humpty  Dumpty  said,  a word  can  retraining  a word.  While  this  works 
be  chosen  by  the  user  to  mean  for  the  moment  (for  example  stressing 
"precisely  what  you  intend  it  to  mean,  the  "t"  sound  in  "eight")  to  eliminate 
nothing  more;  nothing  less,"  since  the  erroneous  recognitions  by  the  system, 
user  "is  the  Master  here."  The  it  immediately  fails  (usually 
specific  words  suggested  by  the  resulting  in  rejects  or  complete 
"training"  sequence  of  the  voice  entry  nonrecognition  of  utterances  of  the 
system  have  been  experimentally  particular  word)  as  soon  as  the 
tested,  though,  and  have  been  found  speaker  reverts  to  habitual,  natural 
comparatively  easy  to  use  and  remember  pronunciation.  The  exact  form  of 
as  well  as  reliably  recognized  by  the  utterance  used  by  any  single  speaker, 
system.  "Niner,"  for  example,  is  a as  noted  above,  is  not  terribly 
better  choice  than  "nine."  important  as  long  as  it  is  always 

used.  For  example,  you  may  choose 
The  voice  system  functions  in  two  to  use  the  expression  "amendment" 
modes:  Training  and  Operation.  At  instead  of  the  suggested  "amend"  to 
any  given  time,  the  machine  expects  to  produce  the  ASCII  translation  AM,  but 
hear  from  only  one  particular  person  you  cannot  say  "amend"  sometimes  and 
whom  it  has  heard  before.  That  is,  it  "amendment"  other  times  and  still 
is  necessary  for  anyone  who  is  to  secure  accurate  recognition, 
use  the  system  to  have  trained  ait^ 

beforehand.  Training  the  system  The  reference  recognition  patterns  for 
consists  simply  in  the  user  saying  each  speaker  are  stored  magnetically — 
each  word  in  the  total  vocabulary  on  tape  or  disk,  for  example— and  in 
several  times.  By  this  means  the  order  to  ready  the  system  for  any 
system  establishes  a composite  refer-  specific  person  at  any  time  the  only 
ence  image  of  the  way  the  particular  action  required  is  to  read  into 
speaker  vocalizes  each  word  in  the  core  storage  the  reference  data  for 
vocabulary.  Different  speakers,  of  that  speaker, 
course,  have  different  voice  pitch 

characteristics,  different  vocal  In  the  operational  mode,  the  system 
tracts  and  different  personal  "sits  waiting"  for  rigidly  formatted 
pronunc iat ions  and  dialects.  In  messages,  even  as  the  present  key- 
general,  this  training  process  need  board  entry  system.  Messages  cannot 
only  be  done  completely  one  time,  be  composed  free  form  or  in  ideomatic 
Experience  with  use  of  the  system  will  continuous  speech.  The  first 
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utterance  expected  by  the  system  is 
one  of  the  words  specifying  a kind  of 
message,  such  as  AMEND  or  ACCEPT 
HANDOFF  (see  appended  lists  of  "parts 
of  speech").  Recognition  of  one  of 
these  "words"  produces  a two-character 
code  on  the  display  unit,  such  as  AM 
or  HO,  followed  by  a space.  In  most 
instances,  the  next  part  of  the 
message  must  be  a three-decimal  digit 
number  (the  computer  I.D.)  to  identify 
the  flight  or  flight  data  store  to 
which  the  message  applies.  The  three 
digits  must  be  spoken  separately,  word 
by  word,  for  example  "three,  six, 
eight"  (not  threesixtyeight") . The 
third  digit  is  automatically  followed 
by  a space  The  next  following  entry 
expected  by  the  system  depends  on 
the  kind  of  message  selected  by  the 
first  utterance — AMEND,  for  example, 
followed  by  the  three  digit  I.D., 
thereafter  expects  the  name  of  a 
flight  plan  data  field  such  as  SPEED 
or  ALTITUDE  (see  list  of  field  names) 
followed  by  an  entry  appropriate  to 
that  data  field  (e.g.,  "four,  two, 
five"  or  "three,  seven,  zero"). 
Spacing,  when  required,  is  auto- 
matically supplied.  After  the  modi- 
fied data  have  been  entered,  the 
system  expects  either  the  name  of 
another  data  field  or  one  of  the 
control  words:  GO  (enter),  BACKSPACE, 
or  ERASE.  The  "GO"  word  (you  could 
say  "enter"  or  "finished"  or  almost 
any  other  word  as  long  as  you  trained 
it  that  way  and  always  said  it  that 
way)  causes  the  message  to  be  trans- 
mitted from  the  data  entry  system  to 
the  processor  as  a completed  message. 
The  funct  ions  of  BACKSPACE  and  ERASE 
are  practically  self-explanatory 
but  are  also  described  in  the  attached 
notes  and  reference  tables. 

Besides  rigid  adherence  to  the  message 
format  rules  (see  tables  attached) 
the  user  requires  only  four  cautions: 


1.  Upon  taking  over  the  operating 
position,  it  is  necessary  to  position 
the  microphone  with  some  care  at 
approximately  the  same  distance  and 
direction  from  the  mouth  at  all  times. 

2.  Set  the  input  volume  control  to 
the  same  setting  found  during  training 
to  be  the  best  for  the  speech  loudness 
habits  of  the  particular  talker. 

3.  Say  the  words  naturally,  with- 
out pauses  within  a "word"  even  if 
they  are  actually  phrases,  such  as: 
DISCRETECODE. 

A.  Pause  briefly  (about  1/10  second) 
between  "words,"  as:  DISCRETECODE, 
ONE,  THREE,  ONE,  etc.  to  allow  the 
machine  to  separate  and  translate 
each  word  in  the  message. 

Certain  types  of  messages  currently 
possible  with  the  existing  key  entry 
system  are  not  (at  present)  possible 
at  all  with  the  voice  system.  Flight 
data  field  10,  route  of  flight,  for 
example  is  not  programed.  There 
are  two  connected  reasons  for  this. 
One  is  that  an  exhaustive  list  of 
fixes,  intersections  and  airways 
would  be  extremely  large  and  the 
time-to-train,  t ime-to-search , and 
recognition  accuracy  would  suffer  very 
seriously.  The  second  reason,  of 
course,  is  that  entry  of  total  flight 
plans  or  extended  routes-of-f light  at 
controller  operating  positions  is 
mercifully  rare  and  generally  undesir- 
able in  any  event.  Another  case  of 
(at  least  for  the  present)  difficult 
voice  entry  is  alphanumeric  identity 
(airline  flight,  or  military  name/#  or 
airframe  tail  number).  As  can  be 
imagined,  this  would  be  necessarily 
complex,  especially  since  airline 
names,  military  code-names,  plus  the 
obvious  phonetic  alphanumerics  would 
all  be  involved.  The  method  we  have 
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provided  for  entry  of  aircraft  types 
(explained  in  detail  in  the  attached 
tables  and  notes)  is  rather  similar  to 
that  which  is  required  for  alpha- 
numeric I.D.  and  is  indeed  rather 
cumbersome. 

The  fact  of  the  matter  at  this  point 
is  that  the  whole  phonetic  alphabet 
is  part  of  the  vocabulary  of  this 
system.  (It  should  be  noted  here 
that  this  can  be  whichever  of  the  many 
phonetic  alphabets  you  are  comfortable 
with,  not  necessarily  the  items 
displayed  in  the  prompting  words 
for  initial  system  training.) 


Furthermore,  the  logic  for  assembling 
strings  of  letters  and  numbers  is 
inherent  in  the  program  used  for 
message  assembly.  Thus,  with  no 
great  difficulty,  the  system  could  be 
modified  to  include  the  capability 
of  a full  "voice  keyboard"  with 
which  any  message  whatever  could  be 
(laboriously)  composed  "voice  key"  by 
"voice  key."  At  this  juncture,  this 
is  unnecessary  for  purposes  of  basic 
experimentation  with  the  system.  In 
the  longer  run,  it  would  seem  prefer- 
able to  modify  the  operational 
procedure  than  to  require  extensive 
keypunching  in  any  event. 
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VOICE  DATA  ENTRY:  D-CONTROLLER  VOCABULARY 


DIGITS  PART  OF  SPEECH 


PRINT  WORD 
DISPLAYED 


WORD  NO.* 


SPOKEN  WORD 


ZERO 


THREE 


FOUR 


FIVE 


SEVEN 


EIGHT 


NINER 


CONTROL  WORDS  (SEE  ALSO  #102  ERASE) 


BACKSPACE 


(ENTER) 


*For  retraining  and  other  purposes 


MESSAGE  KINDS 


PRINT  WORD 


WORD  NO.* 

SPOKEN  WORD 

DISPLAYED 

12 

AMEND 

AM 

13 

CANCEL 

CN 

14 

CORRECTION 

CR 

15 

DEPARTURE 

DM 

16 

DISCRETECODE 

DQ 

17 

READOUT 

FR 

18 

ACCEPT HAN  DOFF 

HO 

19 

HAN DOFF 

HO 

20 

DROPTRACK 

RS 

21 

PRINTSTRIP 

SR 

22 

HOLD 

HM 

23 

RELEASE 

HM 

24 

REPORTALTITUDE 

RA 

25 

WEATHER 

WR 

26 

TRANSMIT 

RF 

*For  retraining  and  other  purposes. 
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FLIGHT  DATA  FIELD  NAMES 

WORD  NO.* 

SPOKEN  WORD 

PRINT  WORD 
DISPLAYED 

27 

TYPE 

03 

28 

QUALIFIER 

03 

29 

BEACONCODE 

04 

30 

SPEED 

05 

31 

FIX 

06 

32 

TIME 

07 

33 

ALTITUDE 

08 

34 

IDENT 

02 

*For  retraining  and  other  purposes. 


1 
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WORD  HO.* 


FIXES 


SPOKEN  WORD 


WILLIAMSPORT 


SELINGSGROVE 


MILTON 


HAZELTON 


WILKE  SB  AR RE 


EASTTEXAS 


LAKE HENRY 


TOBYHANNA 


ALLENTOWN 


STILLWATER 


BENTON 


SWEETV ALLEY 


LOPEZ 


SNYDERS 


SLATINGTON 


WHITEHAVEN 


RESORT 


PENNWELL 


HUGUENOT 


SOLBERG 


FREELAND 


*For  retraining  and  other  purposes. 


PRINT  WORD 
DISPLAYED 


AIRCRAFT  TYPE  NAMES 

PRINT  WORD 

WORD  NO.* 

SPOKEN  WORD 

DISPLAYED 

56 

BOEING 

B 

57 

DOUGLAS 

DC 

58 

LOCKHEED 

L 

59 

CONVAIR 

C 

60 

VICKERS 

vc 

61 

NORD 

N 

62 

BRITISH 

BA 

63 

GENERAL 

— 

64 

MILITARY 

— 

65 

DEHAVILLAND 

DH 

*For  retraining  and 

other  purposes. 

I 
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PHONETIC  ALPHA  (Continued) 

PRINT  WORD 


WORD  NO.* 

SPOKEN  WORD 

DISPLAYED 

84 

SIERRA 

S 

85 

TANGO 

T 

- 86 

UNIFORM 

U 

87 

VICTOR 

V 

88 

WHISKEY 

W 

89 

XRAY 

X 

90 

YANKEE 

Y 

91 

ZULU 

Z 

♦For  retraining  and  other  purposes. 


I; 
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"QUALIFIERS"** 


PRINT  WORD 


WORD  NO.* 

SPOKEN  WORD 

DISPLAYED 

92 

DISCRETE 

/u 

93 

DISCRETE  DME 

/A 

94 

DME 

/D 

95 

NONDISCRETE 

/T 

96 

NONDISCRETE  DME 

/B 

97 

TRANSPONDER 

/X 

98 

TRANSPONDER  DME 

/L 

99 

TACAN 

/M 

100 

TACAN  64 

/N 

101 

TACAN  DISCRETE 

/P 

♦♦These  expressions  are 
even  though  printed  here 

to  be  said  as  all  one  word  such 
and  on  the  training  display  as 

as  "discrete  dee  em 
separate  words. 

CONTROL  WORD 

(SEE  ALSO  #10  BACKSPACE 

and  "11  GO) 

102 

ERASE 

Erases  Entry 

♦For  retraining  and  other  purposes 


Hi'-T 
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CONTROL  WORDS 


GO Momentarily  prints  (ENTER)  on  display, 

then  clears  the  screen.  The  message, 
including  any  backspacing,  is  recorded  for 

1 data  collection  purposes,  together  with 

the  time,  in  seconds,  between  the  selection 
of  a message  kind  entry  and  the  GO  entry. 

ERASE Clears  whole  message,  awaits  a "Message 

Kind"  entry. 

BACKSPACE Removes  last  spoken  entry,  awaits  replace- 

ment from  the  same  subset  of  words. 

Note:  Backspace,  due  to  the  storage  characteristics  of  the  Tektronix  display, 
erases  the  whole  screen  then  rewrites  the  message  all  but  the  last  entry  made 
by  voice.  If  this  (i.e.,  the  entry  backspaced  out)  was  a single  character  in 
a string  of  digits  (as  in  a time,  altitude,  speed,  beacon  code)  or  alpha/digits 
(as  in  General  or  Military  types),  only  the  one  character  will  disappear,  and 
the  machine  will  await  another  number  or  letter.  If  the  last  entry  (i.e., 
"word")  was  a data  field  (e.g.,  speed  * 05)  or  a fix  (e.g.,  Williamsport  ■ IPT) 
etc.,  the  whole  string,  such  as  05  or  IPT  will  be  removed,  and  the  machine  will 
await  another  entry  from  the  same  class  of  words  as  the  word  deleted  (e.g., 
"altitude"  instead  of  "speed"  or  "Allentown"  instead  of  "Williamsport"). 
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VOICE  DATA  ENTRY:  D-CO NT ROLLER  LANGUAGE  STRUCTURE 


RIND  OF 
MESSAGE 


SEQUENCE 


AMEND 

CORRECTION 


3 DIG  IDENT, 


DATA  FIELD 
NAME 


DATA  ENTRY 
FOR  FIELD 


GO* 

II 


DATA  FIELD 
NAMES 


Type 

Beacon  Code 

Speed 

Fix 

Time 

Altitude 

Qualifier 

Ident 


VOICE  ENTRIES 
REQUIRED 


See  Below 
4 Octal  Digits 

3 Decimal  Digits 
Place  Name 

4 Decimal  Digits 
3 Decimal  Digits 

See  List  of  Qualifiers  and  Note  2 
6 Alphanumerics 


Note  1:  After  a "field  name"  and  appropriate  entries  for  that  field  have 
been  entered,  the  ayatem  will  accept  another  field  name  (plus  proper  entries) 
and  yet  another,  etc.,  without  limit,  OR  it  will  accept  an  ERASE  command,  a 
BACKSPACE  command,  or  a GO  (ENTER)  command.  For  detailed  description  of 
ERASE  and  BACKSPACE,  see  attached  Note  on  the  subject. 


FOR  "TYPE"  ENTRIES.  ALWAYS  SAY: 

MFC  NAME,  2 or  3 A/N,  Name  a Qualifier 


or  MILITARY, 
or  GENERAL, 


4 A/N, 
4 A/N, 


IF  YOU  SAY: 

YOU'LL  SEE: 

THEN  SAY: 

Boeing 

B 

3 A/N  e.g.  707 

British 

BA 

2 A/N  e.g.  11 

Vickers 

VC 

2 A/N  e.g.  10 

Lockheed 

L 

3 A/N  e.g.  Oil 

Nord 

N 

3 A/N  e.g.  026 

deHavilland 

DH 

2 A/N  e.g.  C6 

Douglas 

DC 

2 A/N  e.g.  10 

Military 

— 

4 A/N  e.g.  C131 

General 

— 

4 A/N  e.g.  PA13 

*The  last  entry  in  a message,  of  course,  is  always  GO  if  all  is  correct. 
However,  even  at  this  point  (as  at  any  other)  it  is  possible  to  ERASE  or 
BACKSPACE. 
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TO  ENTER  A "TYPE,"  YOU  MUST  ALWAYS  ADD  ONE  OF  THE  EQUIPMENT  QUALIFIERS: 


IF  YOU  SAY: 


YOU'LL  SEE 


Discrete  /U 

Discrete  DME  /A 

DME  / D 

Nondiscrete  /T 

Nondiscrete  DME  /B 

Transponder  /X 

Transponder  DME  /L 

TACAN  /M 

TACAN64  /N 

TACANDiscrete  /P 

Note  2:  If  you  wish  to  enter  an  amendment  to  the  QUALIFIER  part  of  the 
"type"  field  alone,  you  need  only  name  the  data  field  "QUALIFIER"  then  name 
one  of  the  qualifiers  above. 

FINALLY,  YOU  MAY  SAY  "GO"  (to  ENTER),  BACKSPACE  (if  you  wish  to  change 
or  correct  an  error  of  entry  oir  of  recognition)  or  "ERASE," 

OR,  YOU  MAY  NAME  ANOTHER  DATA  FIELD  AND  CONTINUE  AS  BEFORE. 

EXAMPLES: 

Say:  Amend,  three,  three,  one,  altitude,  three,  two,  zero,  go 
See:  AM  331  08  320  (screen  erases  at  GO) 

Say:  Amend,  two,  zero,  five,  type,  Boeing,  seven,  zero,  seven,  discrete- 
deemee,  go 

See:  AM  205  03  B707  /A  (screen  erases  at  GO) 

Say:  Correction,  speed,  two,  two,  zero,  type,  military,  Charlie,  one, 
three,  one,  TACAN,  go 

See:  CR  05  220  03  C131  /M  (screen  erases  at  GO) 

Say:  Correction,  qualifier,  nondiscrete,  go 
See:  CR  03  /T  (screen  erases  at  GO) 
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KIND  OF 
MESSAGE 


SEQUENCE 


REPORTALTITUDE 

3 DIG 

IDENT, 

3 DECIMAL  DIG  (ALT), 

GO* 

DISCRETECODE 

3 DIG 

IDENT, 

4 OCTAL  DIG  (CODE), 

GO* 

HAN DOFF 

3 DIG 

IDENT 

2 DIG  (SECTOR) 

GO* 

DEPARTURE 

3 DIG  IDENT, 

4 

DEC  DIG  (TIME)  (3  DIG  ALT.) 

GO* 

These  messages  consist  of  the  message  kind  followed  by  all  digits.  The 
altitude  for  departures  is  optional  and  must  be  preceded  by  the  word  "altitude" 
where  made,  e.g.,  "DEPARTURE,  THREE  TOO  ZERO,  ONE  FOUR  TOO  FIVE,  ALTITUDE,  TWO 
ONE  ZERO." 


KIND  OF 
MESSAGE 


SEQUENCE 


DROPTRACK 

PRINTSTRIP 

ACCEPTHANDOFF 

READOUT 

CANCEL 


3 DIG  IDENT, 

• t 


GO* 

•• 


ii 

•i 

M 


•I 

•• 

II 


These  messages  are  all  identical  except  for  the  first  word,  the  kind  of 
message. 


KIND  OF 
MESSAGE 


SEQUENCE 


HOLD 

RELEASE 

TRANSMIT 


II 
• I 
II 


4 DIG  (TIME) 
4 DIG  (TIME) 
NAME  (FIX) 


NAME  (FIX) 


GO* 

GO* 

GO* 


These  messages  require  entry  of  a four  digit  time,  or  a one  word  place 
name  (FIX)  or  both  in  addition  to  the  message  kind  and  the  identity  of  the 
flight. 


KIND  OF 
MESSAGE 


SEQUENCE 


WEATHER 


NAME  (FIX) 


GO* 


This  and  CORRECTION  (above)  are  the  only  kinds  of  messages  that  are 
not  immediately  followed  by  identity. 


*This  last  entry  in  a message,  of  course,  is  always  GO  if  all  is  correct.  How- 
ever, even  at  this  point  (as  at  any  other)  it  is  possible  to  ERASE  or  BACKSPACE. 
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EXAMPLES  OF  VOICE  ENTRY 


Note:  When  you  say  "GO"  at  the  end  of  a message,  (ENTER)  will  be  written 
briefly,  then  the  whole  message  will  disappear. 

Modify  assigned  altitude  of  track  #221  to  level  370: 

Say:  AMEND;  two,  two,  one;  altitude;  three,  seven,  zero;  GO 
SEE:  AM  221  08  370  (ENTER) 

Correct  rejected  message  in  speed  data  field  to  420: 

Say:  CORRECTION;  speed;  four,  two,  zero;  GO 
See:  CR  05  420  (ENTER) 

Modify  aircraft  type  and  qualifier,  track  #397,  to  Boeing  707,  discrete  code 
with  DME : 

Say:  AMEND;  three,  niner,  seven;  type;  Boeing,  seven,  zero,  seven; 
discretedeemee;  GO 

See:  AM  397  03  B707  /A  (ENTER) 

Correct  rejected  message,  qualifier  only,  to  discrete  code  transponder: 

Say:  CORRECTION;  qualifier;  discrete;  GO 
See:  CR  03  /U  (ENTER) 

Handoff,  to  sector  12,  track  #424: 

Say:  HANDOFF;  four,  two,  four;  one,  two,  GO 
See:  HO  424  12  (ENTER) 

Accept  the  Handoff  of  track  #337: 

Say:  ACCEPTHANDOFF;  three,  seven,  seven;  GO 
See:  GO  377  (ENTER) 

Note:  PRINTSTRIP  (SR),  READOUT  (FR),  CANCEL  (CN),  and  DROPTRACK  (RS)  have 
the  same  format  except  for  the  code  for  the  kind  of  message. 
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Enter  departure  message  for  track  131,  time  2025,  altitude  175 

Say:  DEPARTURE;  one,  three,  one;  twb,  zero,  two,  five;  altitude; 

one, seven, five;  GO 

See:  DM  131  2025  08  175  (ENTER) 

Enter  reported  altitude  of  350  for  track  #952 

Say:  REPORTALTITUDE ; niner,  five,  two;  three,  five,  zero;  GO 
See:  RA  952  350  (ENTER) 

Enter  discrete  code  of  2200  for  track  #756: 

Say:  DISCRETECODE;  seven,  five,  six;  two,  two,  zero,  zero;  GO 
See:  DQ  756  2200  (ENTER) 

Enter  hold  message  for  track  333,  time  1445  at  Williamsport: 

Say:  HOLD;  three,  three,  three;  one,  four,  our,  five;  Williamsport;  GO 
See:  HM  333  1445  IPT  (ENTER) 

Enter  a release  (hold)  message  at  1500  hours  for  track  #333: 

Say:  RELEASE;  three,  three,  three;  one,  five,  zero,  zero;  GO 
See:  HM  333  1500  (ENTER) 

To  force  a flight  plan  (e.g.  #123)  to  an  ARTS  terminal  (e.g.  Allentown)  prior 
to  the  scheduled  time  (e.g.,  early  flight): 

Say:  TRANSMIT;  one,  two,  three;  Allentown;  GO 
See:  RF  123  ABE  (ENTER) 

To  obtain  a weather  readout,  for  Williamsport 

Say:  WEATHER;  Williamsport;  GO 
See:  WR  IPT  (ENTER) 

Amend  identity  of  track  #416  to  American  142: 

Say:  AMEND;  four,  one  six;  IDENT;  Alpha,  Alpha 
zero,  one,  four,  two;  GO 


See:  AM  416  02  AA0142 


(ENTER) 


EXAMPLES  OF  AIRCRAFT  TYPES 


COMMERCIAL 

B707 , B727  B747,  B737 
DC09,  DC10,  DC81 
L101 , L49C,  L49E,  L188 
DH06,  DH64 
VC07 , VC09 
CV88,  CV58,  CV99 
BA1 1 , BAIO,  BAI5 
MILITARY 

C135,  C131,  CO 5 A 
F102,  FI ID , FllF 
BO 58,  B052,  B057 
KC97,  DC35 
GENERAL 
AC68 

BE33 , BE55 , BE80 
C180,  C165,  C310,  C340 
DH03 

MO 20 , M021 
N265,  N40A,  NA16 
PA22 , PA28,  PACT 
TO  3 9 

Note:  It  is  realized  that  the  four-character  system  in  use  for  voice  entry  will 
not  easily  accoasodate  all  type  designators.  It  is  employed  here  for  test 
use  only. 

II 
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APPENDIX  B 


SAMPLE  PSEUDORANDOM  WORD  LIST 


ZERO 

THREE 

TWO 

SIX 

ONE 

SEVEN 

ERASE 

THREE 

TWO 

BACKSPACE 

SIX 

BACKSPACE 

THREE 

FOUR 

ONE 

EIGHT 

FOUR 

EIGHT 

NINE 

FIVE 

FIVE 

ZERO 

FIVE 

TWO 

SIX 

FIVE 

ZERO 

ONE 

SEVEN 

NINE 

EIGHT 

ERASE 

EIGHT 

ONE 

FOUR 

SEVEN 

NINE 

SIX 

BACKSPACE 

FOUR 

ERASE 

ERASE 

SEVEN 

ZERO 

BACKSPACE 

TWO 

THREE 

NINE 

ONE 

POUR 

ERASE 

THREE 

NINE 

NINE 

FIVE 

TWO 

SEVEN 

SEVEN 

FIVE 

FIVE 

NINE 

ERASE 

THREE 

ERASE 

THREE 

ONE 

ZERO 

SIX 

BACKSPACE 

TWO 

BACKSPACE 

EIGHT 

FOUR 

SEVEN 

SIX 

SIX 

ZERO 

FOUR 

EIGHT 

EIGHT 

TWO 

BACKSPACE 

ONE 

ZERO 

TWO 

BACKSPACE 

THREE 

FIVE 

ERASE 

ERASE 

EIGHT 

NINE 

FIVE 

BACKSPACE 

EIGHT 

TWO 

THREE 

SEVEN 

NINE 

SIX 

SIX 

FOUR 

NINE 

FIVE 

ONE 

ZERO 

FOUR 

EIGHT 

FOUR 

THREE 

ZERO 

SEVEN 

TWO 

SEVEN 

ERASE 

ONE 

BACKSPACE 

ONE 

ZERO 

SIX 

APPENDIX  C 


SAMPLE  OF  RAW  DATA,  EXPERIMENT  I 
SAMPLE  OF  PROCESSED  DATA,  EXPERIMENT  I 


I 


SAMPLE  RAW  DATA 
EXPERIMENT  I 


+00000  +00231  +00007  +00136  +00176 


+00001 

+00002 

□00003 

[+00004 

+00005 

+00006 


♦00009 
+00010 
♦00011 
♦O'"  001 
+00003 
+00005 
♦00007 
+00009 
+00010 
+00000 
+00002 
+00004 
+00006 
+0000B 
+0001 1 
♦00002 
+00005 
+00008 
+00011 
+00003 
+00006 
+00009 
+00000 
+00004 
+00007 
+00010 
+00001 
♦00003 
+00007 
+00011 
+00004 
+00008 
+00000 
+00005 
+00009 
+00001 
+00006 
+00010 
+00002 
+00004 
+00009 
+00002 
+00005 
+00010 
+00003 
+00006 
+00011 
+00007 
+00000 
+00008 


+00232 

+00240 

+000EL 

+002161 

+00212 

+00179 

mat 

+00156 

+00241 

+00145 

+00193 

+00178 

+00181 

+00168 

+00142 

♦00274 

+00228 

+00249 

+00246 

+00157 

♦'0235 

+00165 

+00242 

+00210 

+00189 

+00184 

+00160 

+00147 

+00156 

+00222 

+00225 

+00210 

+00287 

+00192 

+00246 

+00234 

+00176 

+00230 

+00152 

+00212 

+00199 

+00166 

+00202 

+00190 

+00262 

+00210 

+00215 

+00182 

+00215 

+00198 

+00281 

+00260 

+00211 

+00166 

+00243 

+00206 

+00175 


+00005  +00107  +00094 
+00000  +00108  +00111 
+00010  +00CF1  +00214 
+00001  +QOC?S  +00133 
+00001  +00137  +00147 
+00010  +00109  +00222 
+00006  +00113  +00165 


+00011 

+00003 

+00010 

+00005 

+00002 

+00009 

+00006 

+00011 

+00006 

♦00007 

+00003 

+00001 

+00010 

+00003 

+00010 

+00000 

+00009 

+00003 

+00010 

+00002 

+00010 

+00011 

+00002 

+00009 

+00000 

+00003 

+00004 

+00010 

+00006 

+00010 

+00009 

+00003 

+00007 

+00009 

+00011 

+00005 

+00010 

+00003 

+00000 

+00009 

+00011 

+00000 

+00009 

+00003 

+00010 

+00010 

+00003 

+00006 

+00002 

+00003 


+00135 

+00132 

+00095 

♦00095 

+00125 

+00135 

+00104 

+00091 

+00156 

♦00126 

+00142 

+00132 

+00091 

+00126 

+00100 

+00121 

+00130 

+00113 

+00116 

+00109 

+00122 

+00079 

♦00106 

+00100 

+00113 

+00170 

+00121 

+00159 

+00106 

+00105 

+00130 

+00095 

+00098 

+00169 

+00097 

+00127 

+00088 

+00165 

+00116 

+00101 

+00089 

+00115 

+00174 

+00160 

+00161 

+00117 

+00046 

+00114 

+00082 

+00105 


+00145 
+00215 
+00277 
+00101 
+00156 
+00134 
+00160 
+00154 
+00221 
+00175 
+00124 
+00144 
+00209 
+00091 
+00266 
♦00131 
+00164 
+00085 
+00280 
+00138 
+00225 
+00140 
+00180 
+00161 
100157 
+00210 
+00 120 
+00170 
+00175 
+00266 
100153 
+00087 
+00183 
+00162 
+00144 
+00109 
+00188 
+00202 
+00118 
+00146 
+00151 
+00117 
+00149 
+00213 
+00130 
+00195 
+00200 
+00166 
+00188 
100079 


FIRST  CHOICE  WORD  NUMBER 
FIRST  CHOICE  "CORRELATION" 

■ SECOND  CHOICE  WORD  NUMBER 
SECOND  CHOICE  "CORRELATION" 
-DURATION  OF  WORD  SPOKEN 


CORRELATION  LESS  THAN  80, 
THEREFORE  REJECTED 


•REJECTED,  THOUGH  CORRECT 


79-20-C-I 


Lwasa 


SAMPLE  RAW  DATA 
EXPERIMENT  I (Continued) 


1 


+00001 

+00197 

+00004 

+00136 

+00110 

+00011 

+00132 

+00008 

+00054 

+00196 

+00010 

+00261 

+00006 

+00155 

+00221 

+00009 

+00199 

+00011 

+00114 

+00147 

+00008 

+00210 

+00011 

+00106 

+00099 

+00007 

+00233 

+00009 

+00131 

+00161 

+00006 

+00209 

+00010 

+00155 

+00198 

+00005 

+00185 

+00009 

+00155 

+0013 > 

+00004 

+00231 

+00009 

+00120 

+00152 

+00003 

+00243 

+00010 

+00168 

+00122 

+00002 

+00238 

+00007 

+00120 

+00128 

+00001 

+00225 

+00005 

+00144 

+00117 

+00000 

+00186 

+00007 

+00064 

+00183 

+00002 

+00264 

+00000 

+00136 

+00132 

+00010 

+00283 

+00003 

+00174 

+00204 

+00006 

+00216 

+00010 

+00130 

+00189 

+00001 

+00218 

+00005 

+00125 

+00125 

+00009 

+00219 

+00005 

+00129 

+00169 

+00005 

+00170 

+00009 

100146 

+00136 

+00000 

+00213 

+00002 

+00113 

+00177 

+00008 

+00167 

+00003 

+00105 

+00082 

+00004 

+00217 

+00001 

+00114 

+00149 

+00011 

+00185 

+00008 

+00086 

+00239 

+00007 

+00215 

+00006 

+00110 

+00160 

+00003 

+00249 

+00010 

+00158 

+00109 

+00010 

+00235 

+00003 

+00130 

+00203 

+00009 

+00144 

+00005 

+00138 

+00081 

+00007 

+00225 

+00006 

+00105 

+00159 

+00005 

+00193 

+00009 

+00159 

+00140 

+00003 

+00271 

+00010 

+00185 

+00135 

+00001 

+00226 

+00005 

+00117 

+00106 

+6ooli 

+00191 

+00008 

+00089 

+00248 

+00008 

+00183 

+00003 

+00121 

+00082 

3>ooO” 

+00120 

+00007 

+00107 

+001.19 

+00004 

+00226 

+00009 

+00132 

+00345 

+00002 

+00208 

+00008 

+00118 

+00105 

+00000 

+00211 

+00002 

+00092 

+00183 

+00003 

+00266 

+00010 

+00165 

+00124 

+00010 

+00271 

+00003 

+00146 

+00209 

+00005 

+00222 

+00009 

+00190 

+00 144 

+00002 

+00212 

+00008 

+00108 

+00106 

+00009 

+00236 

+00004 

+00105 

+00154 

+00004 

+00233 

+00009 

+00088 

+00129 

+00001 

+00223 

+00007 

+00130 

+00126 

+00008 

+00227 

+00003 

+00151 

+00079 

+00000 

+00221 

+00002 

+00.125 

+00187 

+00007 

+00225 

+00009 

+00119 

+00169 

+00011 

+00178 

+00010 

+00089 

+00258 

+00006 

+00185 

+00010 

+001 18 

+C0202 

+00006 

+00215 

+00010 

+00141 

+00196 

+00003 

+00273 

+00010 

+00153 

+00121 

+00011 

+00162 

+00010 

+00075 

+00249 

+00008 

+00197 

+00003 

+00092 

+00092 

+00005 

+00184 

+00009 

+00160 

+00141 

+00002 

+00223 

+00003 

+00109 

+00120 

+00001 

+00228 

+00007 

+00127 

+00107 

+00010 

+00275 

+00003 

+00182 

+00193 

+00007 

+00217 

+00000 

+00124 

+00147 

+00004 

+00208 

+00009 

+00089 

+00123 

+00000 

+00216 

+00002 

+00096 

+00174 

+00009 

+00180 

+00011 

+00123 

+00133 

R 


REJECTED  DURATIONS  MISMATCHED 

ERROR  (BEST  CHOICE  WAS 

"ERASE").  WAS  ACTUALLY 
REJECTED. 


79-20-C-2 


SAMPLE  PROCESSED  DATA 
EXPERIMENT  I 


11  10  0 0 168.4 

3 1 46.0 

8 3 76.3 

10  6 96.7 


17.64 


247.9  27.75 


SAMPLE  PROCESSED  DATA 

EXPERIMENT  I (Continued) 

7 

10 

0 

0 217.5 

19.86 

161.9 

7.15 

0 

n 

Am 

118.5 

i 

6 

6 

108.7 

9 

n 

Am 

125.0 

8 

10 

0 

0 187.6 

29.04 

85.9 

6.09 

2 

1 

109.0 

3 

8 

113.5 

11 

, 1 

106.0 

9 

10 

0 

0 178.0 

30,12 

141.8 

22.23 

4 

1 

105.0 

5 

2 

133.5 

11 

7 

104.0 

1 

10 

10 

0 

0 267.0 

16.56 

209.1 

8.41 

3 

8 

157.4 

6 

9 

Am 

155.5 

PAUSE 


79-20-C-4 


C-4 


NARRATIVE  MESSAGE  STATEMENTS 

ENTER  DEPARTURE  MESSAGE , TRACK  451 , TIME  1700,  ALTITUDE  350 

CHANGE  AIRCRAFT  IDENTITY  OF  TRACK  140  TO  A27813 

ENTER  DEPARTURE,  TRACK  244  TIME  1509,  ALTITUDE  225 

HANDOFF  TRACK  921  TO  SECTOR  66 

REQUEST  DISPLAY  OF  FLIGHT  PLAN  756 

DROP  PLAN  AND  TRACK  FOR  TRACK  043 

AMEND  NUMBER  817,  IDENTITY  ALLEGHENY  0278  TYPE  DEHAVILL.AND  64, 
DISCRETE  CODE,  TIME  0815  HOURS 

ACCEPT  HANDOFF,  TRACK  NUMBER  564 

HANDOFF  TRACK  445  TO  SECTOR  88 

ACCEPT  HANDOFF,  TRACK  NUMBER  558 

REQUEST  STRIP  FOR  TRACK  377 

AMEND  COORDINATION  FIX  AND  TIME  OF  TRACK  310, 

LAKE  HENRY  AT  0547  HOURS 

ENTER  DEPARTURE  MESSAGE,  TRACK  448,  TIME  0806,  ALTITUDE  290 

REQUEST  WEATHER  FOR  HAZELTON 

REQUEST  DISPLAY  OF  FLIGHT  PLAN  939 

HOLDING  TRACK  359,  0947  HOURS,  AT  HUGUENOT 

AMEND  ASSIGNED  ALTITUDE  OF  TRACK  362  TO  FLIGHT  LEVEL  290 

ENTER  DEPARTURE  MESSAGE,  TRACK  756,  TIME  0634,  ALTITUDE  290 

HANDOFF  TRACK  632  TO  SECTOR  92 

REQUEST  DISPLAY  OF  FLIGHT  PLAN  713 

DROP  TRACK  AND  PLAN  FOR  TRACK  581 

AMEND  NUMBER  837,  ASSIGNED  ALTITUDE  175,  SPEED  425, 
COORDINATION  FIX  TOBYHANNA 

ACCEPT  HANDOFF,  TRACK  NUMBER  412 


HANDOFF  TRACK  549  TO  SECTOR  90 
REQUEST  DISPLAY  OF  FLIGHT  PLAN  976 


79-20-D-l 


MASTER  LIST 
25  VOICE  MESSAGES 


DM  451  1700  08  350  (ENTER) 

AM  140  02  A278.13  (ENTER) 

DM  244  1509  08  225  (ENTER) 

HO  921  66  (ENTER) 

FR  756  (ENTER) 

RS  843  (ENTER) 

AM  81.7  02  AL0278  03  DH64  /U  07  0815  (ENTER) 
HO  564  (ENTER) 

HO  445  88  (ENTER) 

HO  558  (ENTER) 

SR  3 77  (ENTER) 

AM  310  06  LHY  07  0547  (ENTER) 

DM  448  0806  08  290  (ENTER) 

WR  HZL  (ENTER) 

FR  939  (ENTER) 

HM  359  0947  HUO  (ENTER) 

AM  362  08  290  (ENTER) 

DM  756  0634  08  290  (ENTER) 

HO  632  92  (ENTER) 

FR  713  (ENTER) 

RS  581  (ENTER) 

AM  837  08  175  05  425  06  TSD  (ENTER) 

HO  412  (ENTER) 

HO  549  90  (ENTER) 

FR  976  (ENTER  i 
R 


WORDS 


13 

12 

13 

7 

5 

5 

22 

5 

7 

5 

5 

12 

13 

3 

5 

10 

9 

13 

7 

5 

5 

15 

5 

7 

5 


79-20-D-2 


D-2 


CHARACTERS 


DM  451  1700  350  (ENTER) 

AM  140  AID  A27813  (ENTER) 

DM  244  1509  225  (ENTER) 

HO  921  66  (ENTER) 

FR  756  (ENTER) 

RS  843  (ENTER) 

AM  817  AID  AL0278  TYP  DH64/U  TIM  0815  (ENTER) 
HO  564  (ENTER) 

HO  445  88  (ENTER) 

HO  558  (ENTER) 

SR  377  (ENTER) 

AM  310  FIX  L.HY  TIM  0547  (ENTER) 

DM  448  0806  290  (ENTER) 

WR  HZL  (ENTER) 

FR  939  (ENTER) 

HM  359  0947  HUO  (ENTER) 

AM  362  ALT  290  (ENTER) 

DM  756  0634  290  (ENTER) 

HO  632  92  (ENTER) 

FR  713  (ENTER) 

RS  581  (ENTER) 

AM  837  ALT  175  SPD  425  FIX  TSD  (ENTER) 

HO  412  (ENTER) 

HO  549  90  (ENTER) 

FR  976  (ENTER) 


16 

18 

16 

9 

7 

7 

38 

7 

9 

7 

7 

24 

16 

7 

7 

16 

15 

16 
9 
7 
7 

31 

7 

9 

7 


79-20-D-3 


D-3 


1 


RAH  DATA 

25  VOICE  MESSAGES 


I 


I 


' 


DM  451  1700  08  350  (ENTER) 

+00013  f 

Ah  140  02  A27Y13  (ENTER) 
+00012 

DM  #244  1509  08  225  (ENTER) 

+00014  

HO  921  66  (ENTER) 

+00005 

FR  756  (ENTER) 


ERROR  (SHOULD  BE  "8") 


TIME  IN  SECONDS,  FIRST  WORD 
TO  LAST  WORD 


+00004 

CN  *843  (ENTER)  f“ 

+00006  t 

AM  **817  02  AL.0278  03  DH64  #/U  *07  0*815  (ENTER) 
+00052 

HO  564  (ENTER) 

+00004 

HO  445  *88  (ENTER) 

+00008 

HO  558  (ENTER) 

+00005 

SR  377  (ENTER) 


REJECT 


+00004 

AM  310  06  LHY  07  0547  (ENTER) 


+00014 

DM  448  0806  08  290  (ENTER) 
+00012 


WR  HZL  (ENTER) 

+00001 

*#F‘R  539  " 

FR  939  (ENTER) 

+00013 

HM  359  **0947  HIJO  (ENTER) 

+00017 

AM  362  03  290  (ENTER) 

+00008 

DM  7*56  063*4  08  290  (ENTER) 

+00018 

HO  632  92  (ENTER) 

+00006 

**FR  713  (ENTER) 

+00004 

CN  5*1-  ^ 

CN  581  (ENTFR) 

+00009 

AM  837  08  175  05  425  06  TSD  (ENTER) 
+00021 

HO  412  (ENTER) 

+00004 

HO  549  90  (ENTER) 

+00006 

#FR  976  (ENTER) 

+00005 

R 


"ERASE” 


"BACKSPACE" 


79-20-E-l 


E-l 


RAH  DATA 

25  KEYBOARD  MESSAGES 


DM  5-  „ 
DM  451 
+00017- 
AM  140 
+00016 
DM  244 
+00018 
HO  921 
+00004 
FR  756 
+0C001 
RS  845 
+00003 
AM  817 
+00072 
HO  564 
+00002 
HO  4B5 
+00004 
HO  558 
+00002 
RS  377 
+00003 
AM  310 
+00043 
DM  448 
DM  448 
DM  448 
+00023 
UR  HZL. 
+00010 
FR  939 
+00004 
HM  359 
+00022 
AM  362 
+00010 
DM  756 
+00018 
HO  632 
+00003 
FR  713 
+00003 
RS  583 
+00005 
AM  837 
+00053 
HO  412 
+00002 
HO  549 
+00007 
FR  976 
+000C  3 
R 


1700  350  (ENTER) 


AID  A27813  (ENTER) 


1509  ALT  225  (ENTER) 


BACKSPACE 

-TIME  IN  SECONDS,  FIRST  KEY 
TO  LAST  KEY 


66  (ENTER) 
(ENTER) 


center: 


AID  A1.027H  TYR  r.iH64  /T  TIM  0815  (ENTER) 


(ENTER)  I 

88  (ENTER) 

(ENTER) 
l F NTER > 

FIX  I ..MY  TIM  0547  (ENTER) 

0806  ALT  29' 

0806  28- 

0806  290  (ENTER) 

(ENTER) 

( ENTER ) 

« : — 

0937  MUM  CENTER) 

ALT  290  (FNlEK) 

0634  ALT  290  CENTiR) 

92  (ENTER) 

(ENTER) 

(ENTER) 

ALT  i.75  3RD  475  FIX  TSO  (ENTER) 
(ENTER) 

90  (ENTER) 

( ENTER ) 


LANGUAGE  ERROR  WRONG 

QUALIFIER  CODE 

-FORMAT  ERROR  (NO  SPACE) 
-KEY  ERROR  (SHOULD  BE  "4") 


-KEY  ERROR  (SHOULD  BE  "4") 


79- 20- E- 2 


! 


PROCESSED  DATA 
25  VOICE  MESSAGES 


INPUT  STRING(S) 


DM  *244  150V  08 


25  (ENTER) 


CN  *843  (ENTER) 


AM  **817  02  AL0278  03  DH64  */U  *07  0*815  (ENTER) 


**FR  539  ~ 

FR  939  (ENTER) 


HM  359  **0947  HUO  (ENTER) 


DM  7*56  063*4  08  290  (ENTER) 


**FR  713  (ENTER) 


CN  5*1 I 

INPUT  MSG  & MSG  TABLE  ENTRY  DO  NOT  MATCH 


LANGUAGE  ERROR-OPERATOR 
SAID  "CANCEL"  INSTEAD  OF 
"DROPTRACK" 


N|  581  (ENTER) 


PLACEMENT 


+00021 


total  backspace: 
+00001 

TOTAL  ERASE: 
+00001 

TOTAL  ERRORS: 
+00001 

TOTAL  REJECTS: 
+00018 

TOTAL  STRINGS: 
+00052 

TOTAL  CHARACTERS 
+00335 

TOTAL  DURATION 
+00250 


ERROR  SUMMARY? 

MASTER  CHAR  INPUT  CHAR 


FUNCTION  COMPLETED 
TYPE  & FOR  INSTRUCTIONS 


P-1 


PROCESSED  DATA 
25  KEYBOARD  MESSAGES 


INPUT  STRING (S) 


INPUT  MSG  % MSG  TABLE  ENTRY  DO  NOT  MATCH 
DM  244  150V  225  (ENTER) 

DM  244  1509  ALT  225  (ENTER) 

DISPLACEMENT”  | FORMAT  ERROR,  CODE  NOT  REQUIRED 

+00003 

INPUT  MSG  & MSG  TABLE  ENTRY  DO  NOT  MATCH 
AM  817  AID  Al.  07 7 U TYP  DH64/U  TIM  0815  (ENTER) 

AM  817  AID  AL0278  TYP  DH64  /T  flM  0815  'ENTER* 

riT«iPI  ArFMFNTi  k 

+00007  * LANGUAGE  ERROR,  WRONG  CODE 

INPUT  MSG  & MSG  TABLE  ENTRY  DO  NOT  MATCH 

|g|  Y777  LANGUAGE  ERROR,  WRONG  CODE 

DISPLACEMENTS 

+00011 


INPUT  MSG  % MSG  TABLE  ENTRY  DO  NOT  MATCH 
DM  756  0634  390  (ENTER ) 

DM  756  0634  ALT  270  (ENTER' 

displacement:  k 

+ 00018  1 FORK 


FORMAT  ERROR,  CODE  NOT  REQUIRED 


total  backspace: 
+00002 

TOTAL  ERASE: 
+00001 

TOTAL  ERRORS: 
+00CO2 

TOTAL  REJECTS ! 
+00000 

TOTAL  STRINGS: 
+00053 

TOTAL  CHARACTERS 
+00267 

TOTAL  DURA  r TON 
+00237 


ERROR  SUMMARY J 
MASTER  CHAR  [rlPUT  CHAR 
4 5 

4 3 

FUNCTION  COMPLETED 
TYPE  & FOR  INSTRUCTIONS 


79-20-F-2 


